The sorted container types are implemented based on a couple observations. The first is that Python lists are fast, really fast. They have great characteristics for memory management and random access. The second is that bisect.insort is fast. This is somewhat counter-intuitive since it ultimately involves shifting a series of items in a list. But modern processors do this really well. A lot of time has been spent optimizing memcopy/memmove-like operations both in hardware and software.

But using only one list and bisect.insort would produce sluggish behavior for lengths exceeding ten thousand. So the implementation of SortedList uses a list of lists to store values. In this way, inserting or deleting is most often performed on a short list. Only rarely does a new list need to be added or deleted.

SortedList maintains three internal variables: _lists, _maxes, and _index. The first is simply the list of lists. Each element is a list containing items. The second contains the maximum value in each of the lists. This is used for fast binary-search. The last maintains a tree of pair-wise sums of the lengths of the lists.

Lists are kept balanced using the _load factor. If an internal list’s length
exceeds double the load then it is split in two. Likewise at half the load it is
combined with its neighbor. By default this factor is 1000 which seems to work
well for lengths up to ten million. Lengths above that are recommended a load
factor that is the square root of the average length (although you will probably
exhaust the memory of your machine before that point). Experimentation is also
recommended. A *load factor performance comparison* is
also provided.

Finding an element is a two step process. First the _maxes list is bisected which yields the index of a short sorted list. Then that list is bisected for the index of the element.

Compared to tree-based implementations, using lists of lists has a few advantages based on memory usage.

1. Most insertion/deletion doesn’t require allocating or freeing memory. This can be a big win as it takes a lot of strain off the garbage collector and memory system.

2. Pointers to elements are packed densely. A traditional tree-based implementation would require two pointers (left/right) to child nodes. Arrays have no such overhead. This benefits the hardware’s memory architecture and better leverages caching.

3. The memory overhead per item is effectively a pointer to the item. Binary tree implementations must add at least two more pointers per item.

4. Iteration is extremely fast as indexing sequential elements is a strength of modern processors.

Traditional tree-based designs have better big-O notation but that ignores the realities of today’s software and hardware.

Indexing uses the _index list which operates as a tree of pair-wise sums of the lengths of the lists. The tree is maintained as a dense binary tree. It’s easiest to explain with an example. Suppose _lists contains sublists with these lengths (in this example, we assume the _load parameter is 4):

map(len, _lists) -> [3, 5, 4, 5, 6]

Given these lengths, the first row in the index is the pair-wise sums:

[8, 9, 6, 0]

We pad the first row with zeros to make its length a power of 2. The next rows of sums work similarly:

[15, 6] [21]

Then all the rows are concatenated in reverse order so that the index is finally:

[21, 15, 6, 8, 9, 6, 0, 3, 5, 4, 5, 6]

With this list, we can efficiently compute the index of an item in a sublist and, vice-versa, find an item given an index. Details of the algorithms to do so are contained in the docstring for SortedList._loc and SortedList._pos. Maintaining the position index in this way has several advantages:

- It’s easy to traverse to children/parent. The children of a position in the _index are at (pos * 2) + 1 and (pos * 2) + 2. The parent is at (pos - 1) // 2. We can even identify left/right-children easily. Each left-child is at an odd index and each right-child is at an even index.
- It’s not built unless needed. If no indexing occurs, the memory and time accounting for position is skipped.
- It’s fast to build. Calculating sums pair-wise and concatenating lists can all be done within C-routines in the Python interpreter.
- It’s space efficient. The whole index is no more than twice the size of the length of the _lists and contains only integers.
- It’s easy to update. Adding or removing an item involves incrementing or decrementing only log(len(_index)) items in the index. The only caveat to this is when a new sublist is created/deleted. In those scenarios the entire index is deleted and not rebuilt until needed.

The construction and maintainence of the index is unusual compared to other designs described in research. Whether the design is novel, I (Grant Jenks) do not know. I based the dense-tree structure on the efficiency of the heapq module in Python.

Each sorted container has a function named _check for verifying consistency. This function details the data-type invariants.