Sorting Multidimensional Lists |
|
October.12.2003 |
 |
Of all the sorting algorithms around, some attributes in merge sort's makes it ideal to carry over into lingo. The performance isn't as great as something like red-black trees, but because Director doesn't really have pointer data types, merge sort is going to be the easiest sorting algorithm to implement in lingo. It has a runtime of O(n log n), thus is an ideal sorting algorithm when speed and scalabilty are important. As this article continues, I will also discuss ways of adapting merge sort for sorting multidimensional lists and end this article with how the performance of merge sort can be further improved for those up to the challenge.
The code to merge sort is actually pretty simple.There are 2 handlers: mergeSort() and listMerge(). The mergeSort handler uses recursion to break up large lists into smaller ones. The listMerge handler re-orders the individual items in the broken up lists into their sorted position.
on mergeSort(ptr_list, int_startPos, int_endPos)
if int_startPos < int_endPos then
_int_midpoint = ( int_startPos + int_endPos ) / 2
mergeSort(ptr_list, int_startPos, _int_midpoint)
mergeSort(ptr_list, _int_midpoint+1, int_endPos)
listMerge(ptr_list, int_startPos, _int_midpoint+1, int_endPos)
end if
end
Anytime you've got a difficult task at hand, the best way to approach the task is to break down the problem into smaller pieces. This is what the merge sort algorithm is doing. It uses recursion to break apart big lists into 2 equally sized lists. And then break that halfed size list into an even smaller halfed size lists, until eventually you're down to individual lists containing 2 items or less. Once the list is in that manageable size, call handler listMerge() to merge items into their sorted position.
on listMerge ptr_list, int_startPos, int_midpoint, int_endPos
_int_copiedLeft = 0
_int_copiedRight = 0
_int_copied = 0
_int_absLeftEndPos = int_midpoint - int_startPos
_int_absRightEndPos = int_endPos - int_midpoint + 1
_list_tmp = []
repeat while ( _int_copiedLeft < _int_absLeftEndPos ) and ( _int_copiedRight < _int_absRightEndPos)
_int_copied = _int_copied + 1
_int_leftPos = int_startPos+_int_copiedLeft
_int_rightPos = int_midpoint+_int_copiedRight
_int_leftValue = ptr_list[_int_leftPos]
_int_rightValue = ptr_list[_int_rightPos]
-- compare lists to see if the left or right side is bigger
if ( _int_leftValue < _int_rightValue ) then
_list_tmp[_int_copied] = ptr_list[_int_leftPos]
_int_copiedLeft = _int_copiedLeft + 1
else
_list_tmp[_int_copied] = ptr_list[_int_rightPos]
_int_copiedRight = _int_copiedRight + 1
end if
end repeat
-- add everything remaining on the left list
repeat while ( _int_copiedLeft < _int_absLeftEndPos )
_int_copied = _int_copied + 1
_list_tmp[int_copied] = ptr_list[int_startPos+_int_copiedLeft]
_int_copiedLeft = _int_copiedLeft + 1
end repeat
-- add everything remaining on the right list
repeat while ( _int_copiedRight < _int_absRightEndPos )
_int_copied = _int_copied + 1
_list_tmp[int_copied] = ptr_list[int_midpoint+_int_copiedRight]
_int_copiedRight = _int_copiedRight + 1
end repeat
-- change it back in the pointer
_int_counter = 1
repeat with i = int_startPos to int_endPos
-- reattach pointer that points to the correct address from the tmp list
ptr_list[i] = _list_tmp[_int_counter]
_int_counter = _int_counter + 1
end repeat
end
Believe it or not, the code above, is all the code you will need to implement a merge sort for one dimensional lists. All error checking have been removed for illustration purposes.
Notice that both mergeSort() and listMerge() don't have a return value. They both work off pointers. Director provides pointer reference when used as a parameter value if your data type is a: list, rect, loc, or object. References can improve performance because there is no need to duplicate a temporary list onto the call stack. This is especially useful when making frequent recursive calls. Making merge sort work on multidimensional list
Adding multidimensional support will be easy since you have access to modifying the entire source code for the algorithm. The difference between sorting a single dimension list and multidimension list is at comparing values in list level 1 versus comparing values in list level N. Thus the only code change needed is an additional parameter for specifying the list level, and some additional code for accessing the specified list level.
Typically, if you write in lingo: myList[1][3], you are telling the lingo parser to access the 3rd item in the list of the first item in myList. To use the same idea of parameter passing, and to achieve the same method of accessing multidimensions when writing your scripts of your own, minor changes must be made to accomondate for the lack of lingo parsing capabilities in your own code. One quick way of implementing multidimension accessing in your own script is to use a single list that can hold all the list accessor's. This means you would pass parameters in the form of [1,3] and have a code such as:
_salesTableList = [ ["car1", 1999, 20000], ["car3", 2004, 60000], ["car2", 2002, 40000] ]
_listAccessor = [1,3]
_ptrList = _salesTableList
_int_listAccessorCount = listAccessor.count
repeat with i = 1 to _int_listAccessorCount
_int_listIndex = _listAccessor[i]
_ptrList = _ptrList[_int_listIndex]
end repeat
This technique is used to access lists within lists. If you wanted to sort the list above by year(2nd item in the inner list), you could create a parameter with value, [n ,2] which will access the 2nd item in each inner list of the array.
Improving the performance of merge sort further
Runtime is very important to code writers who care about performance. As mentioned above, the runtime of merge sort is O(n log n). Runtime, for those who never seen it before, basically means that : suppose each line of significant code costs 1 instruction to process, then how much processing will this algorithm cost me on average? Runtime is important when you want to know how much of a performance hog will the algorithm perform when you change your lists from something like 100 items to 100000 items. When your list has n items, the ideal runtime O(n) code is to write an algorithm that can scan your list once, from 1 to n, and have entire list sorted in that single run. When you have a piece of code such as
repeat with i = 1 to n
repeat with j = 1 to n
-- do something
end repeat
end repeat
Bubble sort, insertion sort are 2 examples that have a script structures like the example above. This example above illustrate a poor performance runtime of at least O(n*n) or simply O(n2). The performance in this these type of code is poor and does not scale well on large lists.
For those who are up to the challenge, quicksort is an algorithm that have similar characteristics to the merge sort algorithm. Quicksort algorithm also uses partitioning to break up one lists into 2 smaller ones. Quicksort also uses recursive calls to manage and process smaller lists. The major difference between the algorithms is that quicksort uses in place sorting, instead of creating a new temporary list during the merge process. Another difference is at how the lists is divided down. The biggest shock may be that quicksort picks a random value for its partitioning compared to always partitioning the bigger list into 2 equally sized lists. It's been proven that many the random function have desirable attributes that helps increase it's performance for the average cases. The average runtime for quicksort is still O(n log n) however, since quicksort don't waste memory by creating a temporary list during each merge, there will be a performance gain. The quicksort algorithm is not provided. Those interested should look at google for an example in other programming languages and attempt to write their own in lingo as practice. |