Macromedia Director Developer Articles
|
|
|

Coding in lingo
  Sorting Multidimensional Lists
> Binary Search
  Building an efficient table
Director Quirks

Binary Search
 
October.12.2003

If you were searching for "Smith, John" in a phone book, will you randomly flip pages in no apparaent order? Would you read every name from page 1, page 2, page 3 ... all the way to page 678? Of course you will not do such a silly and time wasting chore. You will probably first flip to the middle of the book, and lets say you land at names "Logan, Wolve". Common sense will tell you that "S" is after "L" thus you will flip the phone book a few hundred pages to the right and continue the narrow down process until you reach "Smith".

Binary search is implemented as described above. Instead of searching every value on a list, it searches the middle value of the entire list. It then decides to move left or right based on if the middle value was smaller and greater. It then searches the middle value of the halfed list and continues on the process until the list window is narrowed down to what you are searching for. This process is really efficient because you are not traversing the whole list. By using binary search, you can search for a value in a 30,000 item list in 14 tries or less.

Binary search is useful when lingo's list.getPos(value) does not meet your needs. While getPos()'s performace is much quicker than anything you write in lingo, it does carry three major drawbacks.

  1. getPos() performs case sensative comparision on strings. In fact this the only things in lingo that cases about case sensativity
  2. getPos() can only retrive the first position in your list even if you have duplicate values in your list
  3. getPos() does not work on multi-dimensional lists0

In addition to overcomming these drawbacks, another advantage for owning the source code is that you can modify the script do things such as find the closest upper limit/lower limit value in addition to exact matches.

Just like a phone book, binary search only work if the search list is already in sorted order. This means that binary search is only useful if the time taken to sort and binary search your list outweight the cost it takes to perform a linear search. A linear search ( a search where you write a repeat loop from 1 to list.count ) is ideal when searching is performed only a few times on the life of the list variable. The runtime of linear search is O(n) and the runtime of binary search is O( log n). As described in the previous article, the runtime of merge sort is O( n log n ). If you make 2 or more searches on the same list, you can see that using merge sort and then use binary searching will be more efficient than linear search.

on searchBinary ptr_list, value_toMatch

  -- create a matching return list
  _intList_matchedPos = []

  -- setup
  _int_listCount = ptr_list.count
  _int_startLine = 1
  _int_endLine = _int_listCount
  _int_currentLine = (_int_startLine+_int_endLine)/2

  repeat while TRUE
    _value_current = ptr_list[_int_startLine]

    if ( _value_current = value_toMatch ) then
      -- found a match... don't need to continue binary search
      exit repeat
    else if ( _value_current < value_toMatch) then
      -- current value is too small, go the right more
      _int_startLine = _int_currentLine + 1
      _int_currentLine = ( _int_currentLine + 1 + _int_endLine ) / 2
    else
      -- found ID is too big, go to the left more
      _int_endLine = _int_currentLine - 1
      _int_currentLine = ( _int_startLine - 1 + _int_currentLine ) / 2
    end if

  end repeat
 

In binary search, 3 variables control the entire algorithm: . These three variables_int_startline, _int_endline, and _int_currentline help narrow down the list. Whenever _int_currentValue is too big, the assumption is that anything in the list greater than _int_currentValue will also be too big, thus we eliminate on the list after _int_currentValue. The first repeat while loop keeps narrowing down on the left and right values by half until we find a matching value. By dividing by 2 each repeat loop, even in the worst case secenario, a 1000 item list will require only 10 steps in the repeat loop:

  1. Immediately eliminate 500 values left or right of list position 500 based on comparison of value to find.
  2. Have 500 values remaining, immediately eliminate another 250 values based on comparsion is left or right of the value to find.
  3. Have 250 values remaining, immediately eliminate another 125 values based on comparsion is left or right of the value to find.
  4. Have 125 values remaining, immediately eliminate another 63 values based on comparsion is left or right of the value to find.
  5. Have 63 values remaining, immediately eliminate another 32 values based on comparsion is left or right of the value to find.
  6. Have 32 values remaining, immediately eliminate another 16 values based on comparsion is left or right of the value to find.
  7. Have 16 values remaining, immediately eliminate another 8 values based on comparsion is left or right of the value to find.
  8. Have 8 values remaining, immediately eliminate another 4 values based on comparsion is left or right of the value to find.
  9. Have 4 values remaining, immediately eliminate another 2 values based on comparsion is left or right of the value to find.
  10. Have 2 values remaining, immediately eliminate another 1 values based on comparsion is left or right of the value to find.

 -- allowed duplicate value so need to search the starting and ending line for the value_match

  -- search starting line
  _int_startLine = _int_currentLine
  repeat while ( value_toMatch = ptr_list[_int_startLine] )
    _int_startLine = _int_startLine - 1

    -- exit if we're already at the beginning
    if _int_startLine = 0 then exit repeat
  end repeat
  _int_startLine = _int_startLine + 1

 -- search ending line
  _int_endLine = _int_currentLine
  repeat while ( value_toMatch = ptr_list[_int_endLine] )
    _int_endLine = _int_endLine + 1

    -- exit if we're at the end
    if ( _int_endLine > _int_listCount ) then exit repeat
  end repeat
  _int_endLine = _int_endLine - 1

  -- generate return list for this range
  repeat with i = _int_startLine to _int_endLine
    _intList_matchedPos.addAt(i)
  end repeat

  return _intList_matchedPos

end

The following 2 repeat loops in the script help catch duplicate values. Just like a phone book, once you found a value "smith", there may be more entries for "smith" before and after your first hit. This first repeat loops simply traverse backwards from your first find to finding the first entry. The second repeat loop simply traverse forward to find the last entry in the list. The third repeat loop simply generates a list of positions from the first find to the last find of your search value and return to the caller.

Multi-dimensional support can be added in the same ways described in the merge sort article so they are not discusses here. The full source code illustrates the use of multi-dimensional accessing and additional error checking.

 

[Download the source code to binary search]

Please feel free to post your comments, suggestions, and/or errors on the forum or email them to calu

© Calu. All rights reserved.
Page last modified on May 17, 2004