Today during the Pycon sprints, we had a discussion about which questions people like to ask in interviews. This lead to some interesting discussions and tests that show that that using a heapq can actually be faster than timsort in certain conditions.

# Some Classic Interview Questions

During this discussion, I was reminded of one of my boss Matt Billenstein's favorite questions:

Using any language, take a file withnlines, 5 columns a line, and sort it by the integer value of the second column.

Now this is not a particularly exciting or difficult one to solve, but it is surprising how many people get this question wrong. The answer is basically bound to running time of O(n *log* n), and memory usage of O(n). It can even be solved without ever leaving the shell:

sort -nk2 input.txt > output.txt

I, on the other hand, do really like a slight variation of this question:

Using any language, take this file withnlines, 5 columns a line, and sort it by the integer value of the second columnand output the 10 smallest lines.

Now this is still really easy to do. It can still be done in the command line:

sort -nk2 input.txt | head -n10 > output.txt

But its not necessarily the best way to do it.

# Now a crucial constraint

Adding the constraint that memory usage must be bound to the size of the output makes this problem a little harder. Now the memory must be O(m) where m is the number of lines I want in the output file (in this case O(10)). This means that you are only allowed to keep a list that is 10 items long in memory at any given time.

This is where my bash skills are no longer sufficient so I will switch to python. Now me and a colleague went back and forth over which would be faster.

- Inserting the object, in order, into the result array. (In-order Insertion)
- Appending the item to the array and calling sort on the array. (Lazy Insertion)

The respective runtime complexity for these two are O(nm) and O(nm *log* m), where n is the number of lines in the file and m is the size of the desired output file. So it would reason that the first would be faster, but the lazy insertion uses `timsort` and it is super optimized and written in C. One sure fire way to find out which is faster is to just test it.

The results of this are some what interesting:

So all this really tells us is that is that `timsort` is fast and written in C. Comparing it to something written in python is probably not even worth it, but we did get pretty close. I wonder if we can do better.

# Using The `heapq` For Speed

During working through these tests Honza Kral came up with an optimization for the in-order insertion. The heap in python is written in C, and it has faster run time complexity than timsort. We could store the list as a heap and keep a variable that is the size of the largest item in the heap. If the item we are trying to insert into the `heapq` is smaller than the largest item in the `heapq` we can do one poppush operation. This can be done in O( *log* m) time where m is the number of lines desired in the output file. If we do that for every line in the file (n) we get a total runtime of (n *log* m). For this very reason there seems to be a function in the standard library `heapq.nsmallest` designed to do just this.

The results:

* Violates the memory constraint but just for comparison

So as we expected it does better than both in-order insertion and lazy insertion. Just for fun we compared it to reading everything into memory and running `timsort` on it. Because `heapq` is O(n *log* m) and `timsort` is O(n *log* n) and m < n it makes sense that `heapq` runs faster, but it doesn't beat `timsort` by as much as the runtime complexity would imply.

# Comparing `heapq` vs `timsort`

With a large data set and a small desired output `heapq` will out perform `timsort` in both memory usage and run time complexity, but, `heapq` doesn't out perform `timsort` by the margin I thought it would because of what I can only assume is massive optimization of `timsort` in C. In fact as the magnitude difference to the input size to the output size decreases `timsort` will beat `heapq`:

So the main conclusion we can draw here is that `timsort` is massively optimized to so greatly out perform `heapq`. More than just making smart decision, but it must be optimizations done in the C code as well. But one thing we have to keep in mind is that `heapq` is using far less memory. In conclusion, the `heapq.nsmallest` is better than a sort and truncate, if you are concearned with memory usage or if your output is considerably smaller than your input.

# The Code

```
import random
import heapq
import time
time_a = []
time_b = []
time_c = []
time_d = []
def random_gen(size):
for i in xrange(size):
yield random.randint(0,size)
def a(nums, size_of_result_list):
# A
# In-order Insertion
# memory = O(size_of_result_list)
# time = O(size_of_array * size_of_result_list) *sort not done in C
t = time.time()
result_list = []
for i in nums:
length = len(result_list)
j = -1
if length:
while j >= (-1 - length):
if j == -1-length:
result_list.insert(0,i)
break;
if result_list[j] < i:
result_list.insert(length+j+1,i)
break;
j -= 1
if length+1 > size_of_result_list:
result_list = result_list[:size_of_result_list]
else:
result_list.append(i)
time_a.append(time.time() - t)
def c(nums, size_of_result_list):
# C
# Lazy insertion (generate the item, append it, sort it, trucate it)
# memory = O(size_of_result_list)
# time = O(size_of_array*size_of_result_list*log(size_of_result_list))
t = time.time()
result_list = []
for i in nums:
result_list.append(i)
result_list.sort()
result_list = result_list[:10]
time_c.append(time.time() - t)
def b(nums, size_of_result_list):
# B
# Read it all, sort it with timsort, truncate
# memory = O(size_of_array) *violates the first premise, but just for fun
# time = O(size_of_array*log(size_of_array)) *optimized and written in C
t = time.time()
nums = list(nums)
nums.sort()
result_list = nums[:size_of_result_list]
time_b.append(time.time() - t)
def d(nums, size_of_result_list):
# D
# HeapQ
# memory = O(size_of_result_list)
# time = O(size_of_array*log(size_of_array))
t = time.time()
result_list = heapq.nsmallest(size_of_result_list, nums)
time_d.append(time.time() - t)
def test(num_of_tests, size_of_array, size_of_result_list):
for i in xrange(num_of_tests):
a(random_gen(size_of_array), size_of_result_list)
b(random_gen(size_of_array), size_of_result_list)
c(random_gen(size_of_array), size_of_result_list)
d(random_gen(size_of_array), size_of_result_list)
print "With", size_of_array, "items, outputting ", size_of_result_list, "smallest in", num_of_tests, "tests"
print "Insertion Sort -> Total:", sum(time_a), ", Average:", sum(time_a)/len(time_a)
print "Lazy Insertion Sort -> Total:", sum(time_c), ", Average:", sum(time_c)/len(time_c)
print "Read All, TimSort -> Total:", sum(time_b), ", Average:", sum(time_b)/len(time_b)
print "HeapQ Insertion -> Total:", sum(time_d), ", Average:", sum(time_d)/len(time_d)
if __name__ == "__main__":
test(10000, 10000, 100)
```