Tuesday, January 21, 2020

Second Solution

When you're practicing algorithmic problem solving on the fly, it's important to make the situation as realistic as possible. We are not used to white boarding. We don't often write out code on the whiteboard when we're developing a new program.

Practice problems away from a computer. Either a whiteboard or a piece of paper taped up on the wall, and write out what you would be doing as though you were in the interview situation with someone next to you.

We're now going to go further in the algorithm design. We sorted the array before picking out the kth smallest element. Can we  think about optimizations? Maybe we don't need to sort the whole array, maybe the array is a huge chunk of memory, but we only care about k that is on
the order of millions, not billions. And so what could we do if we were in that sort of a situation?

 Now it's time to brainstorm for other strategies to find the kth smallest element. And one data structure that keeps elements in a somewhat organized fashion is a max-heap. We know that in a max-heap, we have this tree structure whose root is the maximum element. If we're looking for the kth smallest element in a group, and we have all k smallest elements arranged in a heap, then the root of that heap will be the kth smallest. It will be the biggest amongst the k smallest. We can build a heap of size k, where we want to make sure that the elements in that heap are the k smallest overall.

Let's think about how that would work with an example. We don't want to have all of the elements in our array in the heap. The heap would be too big. We just want to focus on the k smallest, and we're taking advantage of the difference in magnitude between k, which is the rank that we're looking for,
and the overall size of the array. And so if we restrict our heap to just three elements, say if k = 3.

Then we might walk along our array and insert elements into the heap. And so we insert 17, that first element, into the heap. And then we insert the next element 42. And of course 42 becomes the root because it's bigger than 17. And then we insert 0 as well because we need three elements in our heap. We now look at 5. We want 5 to be in our heap as well because it's smaller than 42, it's smaller than 17. And so it is a candidate for being one of the three smallest elements and that will be captured in the heap, the three smallest elements overall that we've seen so far.

So what we're going to do is to swap out the current maximum element in the heap, remove it from the heap and put 5 in instead. And so 5 goes in and it has to go in, in its correct location so we still have a heap structure. And the advantage of that is now the 17 gets put to the root and so if we find some small element later on then we know that we're going to remove the biggest element in our current heap and put in the smallest element.

So that at each stage we have, the heap is containing the three smallest elements that we've seen so far, and the root is the maximum of those. And so, we continue, we look at 10, and it's gotta go in instead of 17. We look at -3, it's gotta go in, but now 5 becomes the root, and we keep on going with 2 and with 9. And at this point we notice that our heap has three elements and so the third smallest element in our whole array is going to be the root of that heap, and that's beautiful because we can just read off the root of the heap.

We've used a somewhat fancy data structure, but a very standard one that we have in our back pocket. We've used the property that the root is the maximum element of all of the elements in the heap. And that's helping us to solve this problem. We're demonstrating creative problem solving and flexibility. So we're not focused on one particular problem solving strategy, but we're demonstrating that we're thinking about different avenues.

We can now code the solution. When implementing a max-heap, we need to reverse the order of the competitor and so we're demonstrating that we are sophisticated programmers. We also need to test and analyze our new strategy. Check if it performs any better than our very simple sort and then pick off the kth element strategy. So, we can check our code and look for where we have function calls that take some time and how does the performance depend on k and on n.

We notice that at the beginning stage for the first k elements of the array, we're just adding each one of them into the heap. Afterwards, we're only adding into the heap if the current array element that we're looking at is smaller than the maximum element at the root of the heap. And only at that point do we swap the current element into the heap. Now in each of those situations, we have to do some heap operations we are adding into the heap and we're removing the root of the heap as well.

So here it's important to know that heap operations like insert and remove take time that's logarithmic in the size of the heap. Because we might have to traverse all the way down to the bottom of a path and the maximum length path, it could be at worse log the size of the heap log, the size of the tree. And so we see that we're doing these operations. We're doing a constant number of these operations for each array element that we're looking at and so all in all, this algorithm is going to have performance that's O(n log k). We can compare this performance to previous solution.

We're demonstrating our critical thinking by analysis of the two alternate algorithmic approaches. Previous algorithm was O(n log n), and now we have O(n log k). And so we've made some improvement if, in particular, k is going to be much smaller than n but still grows, and so we still want to take that into account. So this is a very different problem solving strategy from the first one we saw.

And we might go even further and that's what we'll do next.