Binary
Search |
|
October.12.2003 |
 |
If you were searching for "Smith, John"
in a phone book, will you randomly flip pages in no apparaent
order? Would you read every name from page 1, page 2, page 3 ...
all the way to page 678? Of course you will not do such a silly
and time wasting chore. You will probably first flip to the middle
of the book, and lets say you land at names "Logan, Wolve".
Common sense will tell you that "S" is after "L"
thus you will flip the phone book a few hundred pages to the right
and continue the narrow down process until you reach "Smith".
Binary search is implemented as described
above. Instead of searching every value on a list, it searches
the middle value of the entire list. It then decides to move left
or right based on if the middle value was smaller and greater.
It then searches the middle value of the halfed list and continues
on the process until the list window is narrowed down to what
you are searching for. This process is really efficient because
you are not traversing the whole list. By using binary search,
you can search for a value in a 30,000 item list in 14 tries or
less.
Binary search is useful when lingo's list.getPos(value)
does not meet your needs. While getPos()'s performace is much
quicker than anything you write in lingo, it does carry three
major drawbacks.
- getPos() performs case sensative comparision on strings.
In fact this the only things in lingo that cases about case
sensativity
- getPos() can only retrive the first position in your list
even if you have duplicate values in your list
- getPos() does not work on multi-dimensional lists0
In addition to overcomming these drawbacks,
another advantage for owning the source code is that you can modify
the script do things such as find the closest upper limit/lower
limit value in addition to exact matches.
Just like a phone book, binary search only
work if the search list is already in sorted order. This means
that binary search is only useful if the time taken to sort and
binary search your list outweight the cost it takes to perform
a linear search. A linear search ( a search where you write a
repeat loop from 1 to list.count ) is ideal when searching is
performed only a few times on the life of the list variable. The
runtime of linear search is O(n) and the runtime of binary search
is O( log n). As described in the previous article, the runtime
of merge sort is O( n log n ). If you make 2 or more searches
on the same list, you can see that using merge sort and then use
binary searching will be more efficient than linear search.
on
searchBinary ptr_list, value_toMatch
-- create a matching return
list
_intList_matchedPos = []
-- setup
_int_listCount = ptr_list.count
_int_startLine = 1
_int_endLine = _int_listCount
_int_currentLine = (_int_startLine+_int_endLine)/2
repeat while TRUE
_value_current = ptr_list[_int_startLine]
if ( _value_current
= value_toMatch ) then
-- found
a match... don't need to continue binary search
exit
repeat
else if (
_value_current < value_toMatch) then
-- current
value is too small, go the right more
_int_startLine = _int_currentLine
+ 1
_int_currentLine = ( _int_currentLine
+ 1 + _int_endLine ) / 2
else
-- found
ID is too big, go to the left more
_int_endLine = _int_currentLine
- 1
_int_currentLine = ( _int_startLine
- 1 + _int_currentLine ) / 2
end if
end repeat
In binary search, 3 variables control the
entire algorithm: . These three variables_int_startline, _int_endline,
and _int_currentline help narrow down the list. Whenever _int_currentValue
is too big, the assumption is that anything in the list greater
than _int_currentValue will also be too big, thus we eliminate on the list after _int_currentValue.
The first repeat while loop keeps narrowing down on the left and
right values by half until we find a matching value. By dividing
by 2 each repeat loop, even in the worst case secenario, a 1000
item list will require only 10 steps in the repeat loop:
- Immediately eliminate 500 values left or right of list position
500 based on comparison of value to find.
- Have 500 values remaining, immediately eliminate another 250
values based on comparsion is left or right of the value to
find.
- Have 250 values remaining, immediately eliminate another 125
values based on comparsion is left or right of the value to
find.
- Have 125 values remaining, immediately eliminate another 63
values based on comparsion is left or right of the value to
find.
- Have 63 values remaining, immediately eliminate another 32
values based on comparsion is left or right of the value to
find.
- Have 32 values remaining, immediately eliminate another 16
values based on comparsion is left or right of the value to
find.
- Have 16 values remaining, immediately eliminate another 8
values based on comparsion is left or right of the value to
find.
- Have 8 values remaining, immediately eliminate another 4 values
based on comparsion is left or right of the value to find.
- Have 4 values remaining, immediately eliminate another 2 values
based on comparsion is left or right of the value to find.
- Have 2 values remaining, immediately eliminate another 1 values
based on comparsion is left or right of the value to find.
-- allowed duplicate value so need to search the starting and ending line for the value_match
-- search starting line
_int_startLine = _int_currentLine
repeat while ( value_toMatch = ptr_list[_int_startLine] )
_int_startLine = _int_startLine - 1
-- exit if we're already at the beginning
if _int_startLine = 0 then exit repeat
end repeat
_int_startLine = _int_startLine + 1
-- search ending line
_int_endLine = _int_currentLine
repeat while ( value_toMatch = ptr_list[_int_endLine] )
_int_endLine = _int_endLine + 1
-- exit if we're at the end
if ( _int_endLine > _int_listCount ) then exit repeat
end repeat
_int_endLine = _int_endLine - 1
-- generate return list for this range
repeat with i = _int_startLine to _int_endLine
_intList_matchedPos.addAt(i)
end repeat
return _intList_matchedPos
end
The following 2 repeat loops in the script help catch duplicate values. Just like a phone book, once you found a value "smith", there may be more entries for "smith" before and after your first hit. This first repeat loops simply traverse backwards from your first find to finding the first entry. The second repeat loop simply traverse forward to find the last entry in the list. The third repeat loop simply generates a list of positions from the first find to the last find of your search value and return to the caller.
Multi-dimensional support can be added in
the same ways described in the merge sort article so they are
not discusses here. The full source code illustrates the use of
multi-dimensional accessing and additional error checking.
|