Home > Software engineering >  Is this a reasonable method of performing a binary search on an array of strings?
Is this a reasonable method of performing a binary search on an array of strings?

Time:12-30

Given:

  1. All strings used only contain ASCII characters.
  2. A base-128 integer type has been defined. 128 comes from the number of possible characters we will be expecting.
  3. Although most basic operations are not supported, there is implementation to compare one of these base-128 integers to another (<, >, =).
  4. Our data is stored in an array of base-128 integers that each represents a string.

The sorting process would work like this:

  1. Convert all given strings into base-128 integers. This can be done by having each ASCII character represent a different value at each place. Store these in the array.
  2. Utilize a general sorting algorithm using the comparisons defined in the base-128 integer class.

The searching process would work like this:

  1. Convert the "search" string to a base-128 integer.
  2. Implement a basic binary search method on the array of base-128 integers and return the index of a matching value if found.

This seems to work for the most part, but I'd like to know if there's a more reasonable way to approach the problem.

CodePudding user response:

There is no point to convert strings to numbers. (Coincidentally, they appear to have the exact same memory layout anyway - the conversion is a no-op).

You say that

there is implementation to compare one of these base-128 integers to another (<, >, =).

but that's meaningless. We can equally state

there is implementation to compare one of these strings to another (<, >, =).

(which in fact does exist) and all you have described is to "use a general sorting algorithm on an array of strings" and "Implement a basic binary search method on the array of strings". The description of your method becomes empty - yes you can always do X by doing X.


That said, no this is a method with bad time complexity. Lexical string comparison (just like base-128 integer comparison) is linear in the length of the string/integer. It is more efficient to actually build a trie from the array of strings and search in that.

  • Related