Tuesday, January 21, 2020

First Solution

So we've worked through this example and now we understand the problem. It's good to reaffirm the assumptions that we're making before we move on to the solution.

We know that array may have some duplicate elements. We're allowed to change the array.

To find the kth smallest element, we build up to the kth smallest element by looking for the smallest element and then the next smallest element and then the next smallest element after that and then keep on going until we've gone to the kth one.

We notice that this is somewhat similar to the procedure for selection sort. And with selection sort, we find the smallest element and we swap that to a first position of the array so that we can think about ignoring it afterwards or discarding it. And then we focus on the rest of the list and find the smallest element among those remaining elements, swap that to the beginning, and then just keep on focusing on the remaining elements. And this would give us an algorithm that's a variant of selection sort that we already know. We can use our previous knowledge to design algorithm for the given problem.

But before we code this solution, let's analyze it to see if it's worth coding. Because it was our first stab at how we might solve this problem. So it's good to stop and think to evaluate before we go further in this direction. If we evaluate the performance of this approach, what we're doing at each point is finding the minimum element of an array of numbers. If we wanted to find the minimum amongst an array of elements, we have to look at each one of those elements. That takes linear time.

And so even though the number of elements that we are considering each time decreases by one, we are still doing this k many times. And so, for really small k, if we just need to find the very smallest element or the second smallest element, in that situation, this would give a linear algorithm. But in the more general situation, where k maybe is n/ 2, and we're finding the element that is right in the middle, the median, then if we have to do this approach, say n/2 many times or any O(n)many times, then all of a sudden our algorithm becomes quadratic.

And that's a problem. Because quadratic algorithms are slow, especially if our problem has something to do with sorting. And so we've already seen how the algorithm that we are devising is related to sorting, how it's related to the selection sort algorithm. Maybe we can use that insight to come up with a better solution.

And so what if we sorted the array before doing anything in terms of finding the kth element? And as a preprocessing step, let's just apply our favorite sorting algorithm and organize all the elements from smallest to biggest. And the advantage of doing that is that, at this point, the element that we care about, the kth smallest element, is in position k in the array. And accessing an element in a particular position in the array is a constant time operation.

So overall, we've now come up with an algorithm that takes however long a sorting takes for the preprocessing step. And then O(1) time for the second step which is retrieving the kth smallest element. The advantage of having these two separate steps is that sorting is a well studied problem and we know that in the worst case there are algorithms that solve sorting with nlog(n) time. That's a big improvement to quadratic time.

And so we're feeling pretty good that we've made a pretty dramatic improvement to our original naive approach to solving the problem, and now we have a slightly better approach. And at this point, we might be feeling comfortable enough with this approach to go and code it on the whiteboard.