Sorted Containers Introduction

Installation

The first step to using any software library is getting it properly installed. There are several ways to install Sorted Containers.

The recommended way to install Sorted Containers is using the pip command:

$ python3 -m pip install sortedcontainers

You may also choose instead to use the newer pipenv command:

$ pipenv install sortedcontainers

These commands will install the latest version of Sorted Containers from the Python Package Index.

Sorted Containers is actively developed on GitHub, where the code is open source. You may choose to install directly from the source repository. First, you will need a copy of the sources. The recommended way to get a copy of the source repository is to clone the repository from GitHub:

$ git clone git://github.com/grantjenks/python-sortedcontainers.git

You may also choose instead to download the Sorted Containers tarball or download the Sorted Containers zipball.

Once you have a copy of the sources, you can embed it in your Python package, or install it into your site-packages using the command:

$ python3 setup.py install

Sorted Containers is available in Debian distributions as python3-sortedcontainers and python-sortedcontainers.

Sorted Containers is looking for a CentOS/RPM package maintainer. If you can help, please open an issue in the Sorted Containers Issue Tracker.

Sorted List

At the core of Sorted Containers is the mutable sequence data type SortedList. The SortedList maintains its values in ascending sort order. As with Python’s built-in list data type, SortedList supports duplicate elements and fast random-access indexing.

>>> from sortedcontainers import SortedList
>>> sl = SortedList()

Values may be added to a SortedList using either SortedList.update() or SortedList.add(). When doing so, the list remains sorted.

>>> sl.update([5, 1, 3, 4, 2])
>>> sl
SortedList([1, 2, 3, 4, 5])
>>> sl.add(0)
>>> sl
SortedList([0, 1, 2, 3, 4, 5])

Several methods may be used to remove elements by value or by index. The methods SortedList.discard() and SortedList.remove() remove elements by value. And the methods SortedList.pop() and SortedList.__delitem__() remove elements by index. All values may be removed using SortedList.clear().

>>> sl.remove(0)
>>> sl.discard(1)
>>> sl
SortedList([2, 3, 4, 5])
>>> sl.pop()
5
>>> del sl[1]
>>> sl
SortedList([2, 4])
>>> sl.clear()

Because SortedList is sorted, it supports efficient lookups by value or by index. When accessing values by index, the SortedList can be used as an order statistic tree. Rather than performing a linear scan, values can be found in logarithmic time by repeatedly bisecting the internal tree structure. Methods for looking up values are SortedList.__contains__(), SortedList.count(), SortedList.index(), SortedList.bisect_left(), SortedList.bisect_right(), and SortedList.__getitem__().

>>> sl = SortedList('abbcccddddeeeee')
>>> 'f' in sl
False
>>> sl.count('e')
5
>>> sl.index('c')
3
>>> sl[3]
'c'
>>> sl.bisect_left('d')
6
>>> sl.bisect_right('d')
10
>>> sl[6:10]
['d', 'd', 'd', 'd']

Several methods can be used to iterate values in a SortedList. There are the typical sequence iterators: SortedList.__iter__() and SortedList.__reversed__(). There are also methods for iterating by value or by index using SortedList.irange() and SortedList.islice(). These methods produce iterators that are faster than repeatedly indexing the SortedList.

>>> sl = SortedList('acegi')
>>> list(iter(sl))
['a', 'c', 'e', 'g', 'i']
>>> list(reversed(sl))
['i', 'g', 'e', 'c', 'a']
>>> list(sl.irange('b', 'h'))
['c', 'e', 'g']
>>> list(sl.islice(1, 4))
['c', 'e', 'g']

A SortedList also supports the typical sequence operators SortedList.__add__() and SortedList.__mul__() as well as their in-place counterparts.

>>> sl = SortedList('abc')
>>> sl + sl
SortedList(['a', 'a', 'b', 'b', 'c', 'c'])
>>> sl * 3
SortedList(['a', 'a', 'a', 'b', 'b', 'b', 'c', 'c', 'c'])
>>> sl += 'de'
>>> sl
SortedList(['a', 'b', 'c', 'd', 'e'])
>>> sl *= 2
>>> sl
SortedList(['a', 'a', 'b', 'b', 'c', 'c', 'd', 'd', 'e', 'e'])

Although SortedList implements most of the mutable sequence methods, there are five which are not implemented. Each of these methods assigns a value at an index which is not supported by SortedList.

>>> sl = SortedList('abcde')
>>> sl[2] = 'c'
Traceback (most recent call last):
  ...
NotImplementedError: use ``del sl[index]`` and ``sl.add(value)`` instead
>>> sl.reverse()
Traceback (most recent call last):
  ...
NotImplementedError: use ``reversed(sl)`` instead
>>> sl.append('f')
Traceback (most recent call last):
  ...
NotImplementedError: use ``sl.add(value)`` instead
>>> sl.extend(['f', 'g', 'h'])
Traceback (most recent call last):
  ...
NotImplementedError: use ``sl.update(values)`` instead
>>> sl.insert(5, 'f')
Traceback (most recent call last):
  ...
NotImplementedError: use ``sl.add(value)`` instead

Comparison with SortedList uses lexicographical ordering as with other sequence types.

Refer to the Sorted List documentation for additional parameters, more examples, and descriptions of runtime complexity.

Sorted-key List

The Sorted Containers project also maintains a specialized sorted-list-like type that accepts a key-parameter as found in Python’s built-in sorted function. SortedKeyList provides the same functionality as SortedList but maintains the order of contained values based on the applied key-function. This simplifies the pattern of boxing and un-boxing which would otherwise be required.

>>> from operator import neg
>>> from sortedcontainers import SortedKeyList
>>> skl = SortedKeyList(key=neg)

The key function extracts a comparison key for ordering items in the list. In our example above we apply the negation operator. In the example above, a sorted list of integers would be ordered in descending sort order.

You can also construct a SortedKeyList using the SortedList type by passing a key-function to the initializer.

>>> from sortedcontainers import SortedList
>>> values = SortedList([1, 2, 3, 4, 5], key=neg)
>>> values
SortedKeyList([5, 4, 3, 2, 1], key=<built-in function neg>)
>>> isinstance(values, SortedList)
True
>>> issubclass(SortedKeyList, SortedList)
True
>>> values.key
<built-in function neg>

SortedKeyList adds three additional methods to the SortedList type. They are SortedKeyList.bisect_key_left(), SortedKeyList.bisect_key_right(), and SortedKeyList.irange_key(). Each of these methods accepts the key rather than the value for its SortedList counterpart.

>>> skl = SortedKeyList([1, 2, 3, 4, 5], key=neg)
>>> skl
SortedKeyList([5, 4, 3, 2, 1], key=<built-in function neg>)
>>> skl.bisect_key_left(-4.5)
1
>>> skl.bisect_key_right(-1.5)
4
>>> list(skl.irange_key(-4.5, -1.5))
[4, 3, 2]

Refer to the Sorted List documentation for additional parameters, more examples, and descriptions of runtime complexity.

Caveats

Sorted Containers data types have three requirements:

  1. The comparison value or key must have a total ordering.

  2. The comparison value or key must not change while the value is stored in the sorted container.

  3. If the key-function parameter is used, then equal values must have equal keys.

If any of these three requirements are violated then the warranty of Sorted Containers is void and it will not behave correctly. While it may be possible to design useful data types that do not have these requirements, these are the caveats of Sorted Containers and they match those of alternative implementations. Each of these requirements allow for optimizations and together they are an attempt to find the right tradeoff between functionality and performance.

Let’s look at some examples of what works and what doesn’t. In Python, all objects inherit from object which provides a default implementation of equality. In pseudocode, the object type looks something like:

>>> class Object:
...     def __eq__(self, other):
...         return id(self) == id(other)

The default implementation defines equality in terms of identity. Note that Object does not define comparison methods like less-than or greater-than. While Python objects are comparable by default in Python 2, the feature was removed in Python 3. Instances of object can not be stored in a SortedList.

We can extend this example by creating our own Record data type with name and rank attributes.

>>> class Record(object):
...     def __init__(self, name, rank):
...         self.name = name
...         self.rank = rank
...     def __eq__(self, other):
...         return self.name == other.name

The Record type defines equality in terms of its name attribute which may be thought of as a record identifier. Each Record also has a rank which would be useful for ranking records in sorted order. The Record type does not define comparison operators and so can not be stored in a SortedList.

>>> alice1 = Record('alice', 1)
>>> bob2 = Record('bob', 2)
>>> carol3 = Record('carol', 3)

Since the rank attribute is intended for ordering records, the key-function presents a tempting but invalid use for ordering records:

>>> get_rank = lambda record: record.rank
>>> sl = SortedList([alice1, bob2, carol3], key=get_rank)

Although the sorted list now appears to work, the requirements have been invalidated. Specifically #3, since it is now possible for equal values to have inequal keys:

>>> bob4 = Record('bob', 4)
>>> bob2 == bob4  # Equal values.
True
>>> get_rank(bob2) == get_rank(bob4)
False
>>> # ^-- Equal values should have equal keys.
>>> bob4 in sl  # <-- Here's the problem. BAD!
False

In the example above, bob4 can not be found in sl because although bob2 and bob4 are equal, the corresponding keys are not equal. The mistake is a bit easier to see without the key-function. The key-function defined comparisons between records like so:

>>> class Record(object):
...     def __init__(self, name, rank):
...         self.name = name
...         self.rank = rank
...     def __eq__(self, other):
...         return self.name == other.name
...     def __lt__(self, other):
...         return self.rank < other.rank

Written as above, equality between objects is more clearly seen as unrelated to ordering between objects. This is the most common mistake made when using Sorted Containers. The Record type now also violates requirement #1 because equal instances can also be strictly less than each other:

>>> bob2 = Record('bob', 2)
>>> bob4 = Record('bob', 4)
>>> bob2 == bob4
True
>>> bob2 < bob4  # <-- Here's the problem. BAD!
True

In the above example, bob2 and bob4 are equal to each other while bob2 is also strictly less than bob4. The Record type therefore does not have a total ordering. In pseudocode the three requirements for a total ordering are:

  1. If a <= b and b <= a then a == b.

  2. And if a <= b and b <= c then a <= c.

  3. And a <= b or b <= a.

Intuitively, a total ordering is best understood through integer and string types. Each of these common types defines a total ordering and can be used for comparisons in Sorted Containers. Of the built-in types in Python, these have a total ordering:

  1. Integers

  2. Strings and bytes.

  3. All floating-point numbers except float('nan').

  4. Sequences like list and tuple of values with a total ordering.

There are also some built-in Python types and values which lack a total ordering:

  1. Sets and frozensets (not a total ordering).

  2. float('nan') (not a total ordering).

  3. Mapping types (not comparable, changed in Python 3).

The best way to fix the Record type is to define equality and comparison in terms of the same fields.

>>> class Record(object):
...     def __init__(self, name, rank):
...         self.name = name
...         self.rank = rank
...     def _cmp_key(self):
...         return (self.rank, self.name)
...     def __eq__(self, other):
...         return self._cmp_key() == other._cmp_key()
...     def __lt__(self, other):
...         return self._cmp_key() < other._cmp_key()

The example above uses a comparison-key method named _cmp_key and the lexicographical ordering semantics of tuples to define equality and comparison in terms of the rank and name fields. It would also be possible to omit the Record.__lt__ method and instead use a key-function which called record._cmp_key(). But the key-function will take more memory and be slower as it uses a SortedKeyList which stores a reference to the key for every value.

The Record example above is complicated by equality defined as those objects with equal names. When using equality inherited from object, that is, defined in terms of instance identity, the situation is simplified. For example:

>>> class Record(object):
...     def __init__(self, name, rank):
...         self.name = name
...         self.rank = rank
...     def __lt__(self, other):
...         return self.rank < other.rank

The Record type now can be stored in SortedList because equality based on instance identity guarantees the rank attributes are equal so long as its value has a total ordering.

>>> alice1 = Record('alice', 1)
>>> bob2 = Record('bob', 2)
>>> carol3 = Record('carol', 3)
>>> bob4 = Record('bob', 4)
>>> bob2 != bob4  # <-- Different instances, so unequal. GOOD!
True
>>> sl = SortedList([alice1, bob2, carol3, bob4])
>>> bob2 in sl
True
>>> bob4 in sl
True

The above example displays that all three requirements are followed:

  1. The comparison key, rank, is an integer, which has a total ordering.

  2. The comparison key does not change while the value is stored in the sorted container.

  3. Equal Record instances have equal rank attributes based on instance identity.

These examples can be summarized in two pieces of advice:

  1. If the data type defines its own equality, that is __eq__, then make sure the comparison methods or key-function define a total ordering and equal values have equal comparison keys.

  2. If the data type does not define equality then it inherits equality from object and uses instance identity. Compare objects using comparison methods like __lt__ or the key-function. The compared values must have a total ordering.

Another invalid use case of SortedKeyList occurs when the key-function returns a different comparison key for a given value while the value is stored in the sorted container.

>>> from random import random
>>> key_func = lambda value: random()
>>> sl = SortedList([1, 2, 3, 4, 5], key=key_func)
>>> # ^-- Corrupt sorted list.
>>> 3 in sl
False
>>> key_func(1) == key_func(1)  # <-- Here's the problem. BAD!
False

The example above violates two requirements of sorted lists: equal values must have equal keys and the key function must return the same key for the given value while the value is stored in the sorted container. The random key-function does not reliably return the same key for a given value. The order of values in a sorted container can be made arbitrary and reliable by changing the key-function like so:

>>> from random import seed
>>> def key_func(value):
...     "Key-function for arbitrary but reliable order."
...     seed(value)
...     return random()

Another way the problem above manifests itself is when the comparison key of an object is mutated while the object is stored in the SortedList. Using the Record definition from above:

>>> class Record(object):
...     def __init__(self, name, rank):
...         self.name = name
...         self.rank = rank
...     def __lt__(self, other):
...         return self.rank < other.rank
>>> alice1 = Record('alice', 1)
>>> bob2 = Record('bob', 2)
>>> carol3 = Record('carol', 3)
>>> sl = SortedList([alice1, bob2, carol3])
>>> bob2 in sl
True

Python objects are typically mutable so while the above example works and is correct, there’s nothing preventing a modification to the rank of a Record. If the rank is modified while the Record instance is stored in the SortedList, then the container is corrupted.

>>> bob2.rank = 20  # <-- Here's the problem. BAD!
>>> bob2 in sl
False

The Record instance bob2 can no longer be found in the SortedList because modifying the rank changed its sort order position without updating its position in sl. To modify the sorted order position, remove the value, update it, and then add the value back again.

Similar limitations also apply to Python’s dict data type which can be corrupted by modifying the hash of a key while the item is stored in the dict. The hashing protocol also requires that equal keys have equal hashes.

One final caveat: indexing a sorted list is fast but not as fast as indexing Python’s built-in list data type. The runtime complexity for indexing a sorted list is O(log(n)) while it is O(1) for Python’s built-in list data type. Indexing a sorted list requires building and maintaining a positional index which is not built if not necessary. The index is fast and light on memory overhead but avoid positional indexing if able. Indexing at the front or back, indexes like 0 and -1, is optimized and will not require the positional index.

Sorted Dict

Built atop Python’s built-in dict data type and SortedList is the mutable mapping data type SortedDict. Sorted dict keys are maintained in sorted order. The design of SortedDict is simple: sorted dict inherits from dict to store items and maintains a sorted list of keys. SortedDict keys must be hashable and comparable. The hash and total ordering of keys must not change while they are stored in the SortedDict.

>>> from sortedcontainers import SortedDict
>>> sd = SortedDict()

Items may be added to a SortedDict using SortedDict.__setitem__(), SortedDict.update() or SortedDict.setdefault(). When doing so, the keys remain sorted.

>>> sd['e'] = 5
>>> sd['b'] = 2
>>> sd
SortedDict({'b': 2, 'e': 5})
>>> sd.update({'d': 4, 'c': 3})
>>> sd
SortedDict({'b': 2, 'c': 3, 'd': 4, 'e': 5})
>>> sd.setdefault('a', 1)
1
>>> sd
SortedDict({'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5})

Several methods may be used to remove items by key or by index. The methods SortedDict.__delitem__() and SortedDict.pop() remove items by key. And the method SortedDict.popitem() removes items by index.

>>> del sd['d']
>>> sd.pop('c')
3
>>> sd.popitem(index=-1)
('e', 5)
>>> sd
SortedDict({'a': 1, 'b': 2})

Because SortedDict uses both dict and SortedList, there are many methods for looking up keys from each type. The mapping interface supports SortedDict.__getitem__(), SortedDict.__contains__(), SortedDict.get(), and SortedDict.peekitem().

>>> sd['b']
2
>>> 'c' in sd
False
>>> sd.get('z') is None
True
>>> sd.peekitem(index=-1)
('b', 2)

The sequence interface supports the same lookup methods as those provided by SortedList.

>>> sd.bisect_right('b')
2
>>> sd.index('a')
0
>>> list(sd.irange('a', 'z'))
['a', 'b']

The keys, items, and values views also support both set semantics and sequence semantics with optimized methods for lookups by index.

>>> keys = sd.keys()
>>> keys[0]
'a'
>>> items = sd.items()
>>> items[-1]
('b', 2)
>>> values = sd.values()
>>> values[:]
[1, 2]

The SortedDict initializer supports one additional position argument. The positional argument must come before any sequences, mappings, or keyword arguments used to initialize the items in a SortedDict. The first positional argument is an optional callable key-function used to extract a comparison key from the keys of the SortedDict. For example, to construct a SortedDict with integer keys in descending order:

>>> sd = SortedDict(neg, enumerate('abc', start=1))
>>> sd
SortedDict(<built-in function neg>, {3: 'c', 2: 'b', 1: 'a'})
>>> keys = sd.keys()
>>> list(keys)
[3, 2, 1]

Because SortedDict inherits from dict, the __missing__ method can be used to give missing keys a default value. Customizing the __missing__ method creates a kind of defaultdict like that in the collections module.

>>> class DefaultSortedDict(SortedDict):
...     def __missing__(self, key):
...         value = 0
...         self[key] = value
...         return value
>>> dsd = DefaultSortedDict()
>>> dsd['z']
0

Refer to the Sorted Dict documentation for additional parameters, more examples, and descriptions of runtime complexity.

Sorted Set

Standing on the shoulder’s of Python’s built-in set data type and SortedList is the mutable set data type SortedSet. Sorted set values are maintained in sorted order. The design of SortedSet is simple: sorted set uses Python’s built-in set for set-operations and maintains a sorted list of values. Values stored in a SortedSet must be hashable and comparable. The hash and total ordering of values must not change while they are stored in the SortedSet.

>>> from sortedcontainers import SortedSet
>>> ss = SortedSet()

SortedSet implements optimized versions of the core mutable set methods: SortedSet.__contains__(), SortedSet.add(), SortedSet.update(), SortedSet.discard(), and the more strict SortedSet.remove().

>>> ss.add('c')
>>> ss.add('a')
>>> ss.add('b')
>>> ss
SortedSet(['a', 'b', 'c'])
>>> 'c' in ss
True
>>> ss.discard('a')
>>> ss.remove('b')
>>> _ = ss.update('def')
>>> ss
SortedSet(['c', 'd', 'e', 'f'])

SortedSet also behaves like a sequence with SortedSet.__getitem__() and SortedSet.__reversed__() methods. And includes the mutable sequence methods SortedSet.__delitem__() and SortedSet.pop().

>>> ss[0]
'c'
>>> list(reversed(ss))
['f', 'e', 'd', 'c']
>>> del ss[0]
>>> ss.pop(index=-1)
'f'
>>> ss
SortedSet(['d', 'e'])

Set-operation methods like SortedSet.difference(), SortedSet.intersection(), SortedSet.symmetric_difference(), and SortedSet.union(), as well as their in-place counterparts and operators are all supported.

>>> abcd = SortedSet('abcd')
>>> cdef = SortedSet('cdef')
>>> abcd.difference(cdef)
SortedSet(['a', 'b'])
>>> abcd.intersection(cdef)
SortedSet(['c', 'd'])
>>> abcd.symmetric_difference(cdef)
SortedSet(['a', 'b', 'e', 'f'])
>>> abcd.union(cdef)
SortedSet(['a', 'b', 'c', 'd', 'e', 'f'])
>>> abcd | cdef
SortedSet(['a', 'b', 'c', 'd', 'e', 'f'])
>>> abcd |= cdef
>>> abcd
SortedSet(['a', 'b', 'c', 'd', 'e', 'f'])

Several SortedList methods are also exposed on SortedSet objects like SortedList.bisect(), SortedList.index(), SortedList.irange(), and SortedList.islice() just as with SortedDict.

>>> ss = SortedSet('abcdef')
>>> ss.bisect('d')
4
>>> ss.index('f')
5
>>> list(ss.irange('b', 'e'))
['b', 'c', 'd', 'e']
>>> list(ss.islice(-3))
['d', 'e', 'f']

Like SortedList an additional key parameter can be used to initialize the SortedSet with a callable which is used to extract a comparison key.

>>> ss = SortedSet([1, 2, 3], key=neg)
>>> ss
SortedSet([3, 2, 1], key=<built-in function neg>)

Sorted set comparisons use subset and superset relations. Two sorted sets are equal if and only if every element of each sorted set is contained in the other (each is a subset of the other). A sorted set is less than another sorted set if and only if the first sorted set is a proper subset of the second sorted set (is a subset, but is not equal). A sorted set is greater than another sorted set if and only if the first sorted set is a proper superset of the second sorted set (is a superset, but is not equal). The comparison semantics of sorted sets do not define a total ordering.

Refer to the Sorted Set documentation for additional parameters, more examples, and descriptions of runtime complexity.

Migrating

The performance comparison page documents several alternative implementations of the sorted types described. Some of those projects have deprecated themselves and now recommend Sorted Containers instead. When migrating from other projects, there are a couple of things to keep in mind.

Sorted Containers went through a major version change between version one and version two. The goal of the change was to adopt Python 3 semantics wherever possible:

  1. Several SortedList methods now raise NotImplementedError: SortedList.__setitem__(), SortedList.append(), and SortedList.extend(). Use SortedList.add() or SortedList.update() instead.

  2. SortedDict has adopted the Python 3 semantics of dict views as the default. The SortedDict.keys(), SortedDict.items(), and SortedDict.values() methods now return views with support for optimized indexing. The view objects also implement set and sequence protocol methods. Prefer using the SortedDict methods directly for better performance.

  3. Some type and parameter names were changed. SortedListWithKey was renamed to SortedKeyList but an alias remains for compatibility. Several methods which accepted a val parameter now accept value for better readability.

The Sorted Containers Release History documents all the changes made in every version in the history of the project. The Version 2 release notes detail all the changes made.

The blist project remains the most similar as its API was the original inspiration for Sorted Containers. The main difference has always been the SortedList.pop() method. The blist project pops the first element by default while Sorted Containers pops the last element and matches the API of Python’s built-in list data type. The sorted dict data type in blist also continues to use the old Python 2 semantics for dict views.

The bintrees project now recommends using Sorted Containers instead and has stopped development. The API differs significantly but the supported functionality is the same. The Tree object in bintrees is most similar to SortedDict. All of the mapping methods and set methods are available using either SortedDict or SortedKeysView. The slicing methods and previous/successor iterator methods correspond to SortedDict.irange() and the heap methods correspond to indexing with views like SortedKeysView.__getitem__().

The banyan project has data types similar to SortedDict and SortedSet. Most methods have a direct counterpart in Sorted Containers. But the frozen sorted dict and frozen sorted set data types have no direct comparison. The functionality of hashing can be implemented by inheriting and defining the __hash__ method. Do so with care, because the instance is still mutable. The banyan project also supports tree augmentation which can be useful in implementing interval trees or segment trees. There is no support for tree argumentation in Sorted Containers.