The 'halve it every time' idea. How binary search finds something in a million items in about twenty steps, where it quietly powers databases and even git, and the one condition it needs.
In the last post I said the hash map is my favourite structure because it turns “give me this exact thing” into a single step. But I also said its weakness: it has no sense of order. It cannot answer “everything between these two dates” or “the closest match below this value.” For those, you need the data in sorted order, and once it is sorted, you get to use one of the most satisfying ideas in computing: binary search.
You have played this without knowing its name. I think of a number between 1 and 100, and you guess. Each time, I only tell you “higher” or “lower.” What do you do? You do not start at 1 and count up. You guess 50. I say lower. You guess 25. I say higher. You guess 37. And so on. Every guess throws away half of what is left.
That is binary search. Look at the middle, decide which half the answer is in, throw the other half away, repeat. It is exactly how you use a physical dictionary. You do not read from page one. You open near the middle, see whether your word is before or after, and flip into the correct half.
Because you halve the problem every step, the number of steps grows incredibly slowly.
A plain scan of a billion items does a billion checks. Binary search does about thirty. And here is the magic of it: doubling the size of the data adds only one extra step. We call this “order log n,” and it is the next best thing to instant.
def binary_search(sorted_items, target): low, high = 0, len(sorted_items) - 1 while low <= high: mid = (low + high) // 2 # the middle if sorted_items[mid] == target: return mid # found it elif sorted_items[mid] < target: low = mid + 1 # answer is in the right half else: high = mid - 1 # answer is in the left half return -1 # not present
nums = [3, 11, 18, 27, 34, 42, 55, 61] # must be sortedprint(binary_search(nums, 42)) # 5Notice the one assumption sitting quietly in there: the list must be sorted. That is the whole price of admission.
You rarely write binary search by hand, because it is built into the tools you already use. But it is underneath a lot of them.
git bisect checks out the commit halfway through your history, you say “good” or “bad,” and it halves the range each time. A thousand commits, found in about ten checks. That is binary search applied to your own project history.Hash map and binary search are the two answers to two different questions. The hash map says “give me this exact thing, now.” Binary search says “the data is in order, so let me jump to roughly the right place and home in.” One trades memory for instant exact lookups. The other trades the cost of sorting for fast order-aware search.
Between them, they cover a huge share of what programs spend their time doing, which is finding things. Most of the speed you feel in good software is one of these two ideas working quietly underneath.
Next in this series: stacks and queues, the two ways to wait in line.