## Why is insertion sort O(N^2) ?

A place to discuss the science of computers and programs, from algorithms to computability.

Formal proofs preferred.

Moderators: phlip, Moderators General, Prelates

Boson Collider
Posts: 13
Joined: Wed Oct 29, 2014 6:20 pm UTC

### Why is insertion sort O(N^2) ?

So, first defining insertion sort, to make sure we're on the same page. I picture the algorithm in pseudocode like this:

Sort (UnsortedList) = InsertionSort (empty list, UnsortedList)

InsertionSort (SortedList, UnsortedList) =
If unsortedList == empty list,
then return SortedList
Else return InsertionSort( Insert(Head of UnsortedList, SortedList ) , Tail of UnsortedList )

Where the head of a list is its first element and its tail the rest of the list, and where Insert(element, List) is a function that inserts the element in the correct position into the List and returns the new sorted list.

This leads to N tail recursive calls. As I understand it, the argument that the algorithm has O(N^2) complexity comes from the assumption that the Insert function runs in O(N) worst case.

But shouldn't you be able to write an insert function that runs in O(Log N) time, since Insert is only called with a sorted list? For example:

Insert(element,List) =
if element > Last (List) then
return List:element
else if element > Middle (List) then
return ( firstHalf (List)) ++ Insert (element, lastHalf (List) )
else return Insert (element, FirstHalf (List)) ++ ( lastHalf (List) )

should run in O(Log N) time if I'm not mistaken, with not much overhead. I'm sure you could do even better with a nonrecursive function that works directly with list indices.

So why is InsertionSort usually given as a Log(N^2) time algorithm? Is there something that I've missed?

korona
Posts: 495
Joined: Sun Jul 04, 2010 8:40 pm UTC

### Re: Why is insertion sort O(N^2) ?

The insertion cannot be implemented in O(log n). How do you represent the list? Sorting algorithms are usually designed to work on arrays. If you are doing that you can use binary search to find the correct position of the element but in order to insert it you have to shift O(n) additional elements in the worst case. If you chose to represent the list as a linked list you can insert an element in the middle of the list in O(1) but you cannot use binary search to find the correct position.

Boson Collider
Posts: 13
Joined: Wed Oct 29, 2014 6:20 pm UTC

### Re: Why is insertion sort O(N^2) ?

Ah OK, so the explanation is that even if you only have O(N Log N) comparisons, other operations on common data structures have a faster scaling and will become dominant if comparisons aren't the bottleneck. That makes sense. I guess it also explains why insertion sort for non-computing applications like sorting playing cards works so well since shifting costs nothing.

eMPiko
Posts: 6
Joined: Wed Jul 11, 2012 11:12 am UTC

### Re: Why is insertion sort O(N^2) ?

Not exactly, you're saying other operations are causing the whole algorithm to not have O(n * log n). You can achieve such O using the right data structures, such as self organizing binary trees (e.g. red-black trees). This sorting is however inferior to other approaches because of the overhead of such structures. This kind of sorting is sometimes also called binary insertion sort. Normal insertion sort is traditionally implemented over array. That is why the O(N^2) is used while discussing it.

KnightExemplar
Posts: 5494
Joined: Sun Dec 26, 2010 1:58 pm UTC

### Re: Why is insertion sort O(N^2) ?

Boson Collider wrote:This leads to N tail recursive calls. As I understand it, the argument that the algorithm has O(N^2) complexity comes from the assumption that the Insert function runs in O(N) worst case.

But shouldn't you be able to write an insert function that runs in O(Log N) time, since Insert is only called with a sorted list? For example:

Alternative answer: Yes you can. That's called HeapSort. A "Heap" is a special sorted list that has O(LogN) insertions and O(LogN) removals.

Spoiler:
Sorta. An optimized heapsort will use "heapify" to create the heap in O(n) time. Academically however, creating the heap with n * (log(n)) steps makes "Heapsort" easier to explain.

If you have an "insertion-sort" that uses a Heap, its called HeapSort.
First Strike +1/+1 and Indestructible.

ucim
Posts: 6715
Joined: Fri Sep 28, 2012 3:23 pm UTC
Location: The One True Thread

### Re: Why is insertion sort O(N^2) ?

korona wrote:If you chose to represent the list as a linked list you can insert an element in the middle of the list in O(1) but you cannot use binary search to find the correct position.
Suppose you partially index the list? For example, you store (in an array) the address and value of every (say) tenth element. You can use a binary search on this array to jump into the linked list, so you don't have to traverse the whole thing, just the last mile. Yes, there's some overhead in maintaining the array, but the array needn't be perfect in order to speed the thing up. Would this speedup not be enough to reduce the O factor? What if it were done recursively (that is, index the index as n gets bigger)?

Jose
Order of the Sillies, Honoris Causam - bestowed by charlie_grumbles on NP 859 * OTTscar winner: Wordsmith - bestowed by yappobiscuts and the OTT on NP 1832 * Ecclesiastical Calendar of the Order of the Holy Contradiction * Heartfelt thanks from addams and from me - you really made a difference.

Tub
Posts: 431
Joined: Wed Jul 27, 2011 3:13 pm UTC

### Re: Why is insertion sort O(N^2) ?

I believe there are four characteristics of insertion sort:
a) you're taking the input elements one by one.
b) you're inserting the elements directly into the output data structure.
c) after each insert, the output data structure is sorted.
d) the memory overhead is O(1).

If you push the elements into a heap, then empty the heap into the output structure, it's not insertion sort because of b). As soon as you add an index on top of the output data structure, it's not insertion sort because of d).

And I think that's an important point to make: adding an index is not improving insertion sort. It's making a different tradeoff, memory vs. speed.

About your actual suggestion, the index you're thinking of is called a skip list. If done correctly, it has O(log(n)) insertion costs on average, but the worst case insertion costs are O(n). So - unless you can prove otherwise via amortization - the worst case complexity for the whole sort must still be assumed O(n^2).
It's possible to reduce this to O(n*log(n)) with a different index structure (a B+ tree comes to mind, it keeps the leaf nodes in a linked list anyway), but again, this requires additional memory, so it's not insertion sort.

Also, these things only work if your output structure is a list. On an array, I'm not aware of any way to get insertion sort below O(n^2) while keeping characteristics a, b and c.

Derek
Posts: 2180
Joined: Wed Aug 18, 2010 4:15 am UTC

### Re: Why is insertion sort O(N^2) ?

KnightExemplar wrote:
Boson Collider wrote:This leads to N tail recursive calls. As I understand it, the argument that the algorithm has O(N^2) complexity comes from the assumption that the Insert function runs in O(N) worst case.

But shouldn't you be able to write an insert function that runs in O(Log N) time, since Insert is only called with a sorted list? For example:

Alternative answer: Yes you can. That's called HeapSort. A "Heap" is a special sorted list that has O(LogN) insertions and O(LogN) removals.

Spoiler:
Sorta. An optimized heapsort will use "heapify" to create the heap in O(n) time. Academically however, creating the heap with n * (log(n)) steps makes "Heapsort" easier to explain.

If you have an "insertion-sort" that uses a Heap, its called HeapSort.

Alternatively, you can insert the elements into a sorted binary tree. This is called Tree Sort .