Let's say I have a lot of IPv4s stored as integers (specifically, in a relational database) and I want to do a substring search on them given a string representation of an IP.
For example, a user types in 12.3
and expects that they get back results such as 12.30.45.67
, 192.168.12.3
, 1.12.34.5
, 9.212.34.5
.
If the IP were a string, I could just do a plain substring search. It might not be efficient but it is at least simple to implement and understand. But because I can't readily change it into a string at the moment, I don't see any efficient (in terms of CPU cycles, memory, and also my development/implementation time) way of doing this, but maybe I am just missing something.
CodePudding user response:
You aren't missing anything.
For example try to turn 12.3
into a series of ranges. In whichever octet the 12
is in, there will be 3 options (12
, 112
, 212
). In whichever octet the 3
is in there will be 2 options (3
and 30-39
). That's 6 range per combination of preceding octets.
But the preceding octets? We have 1 256 256*256
depending whether 0, 1 or 2 octets precede your start.
That's a grand total of 3 * 2 * (1 256 256*256) = 394758
ranges of numbers you have to search in. It is unlikely that doing that many index searches will be faster than scanning everything.
Incidentally the worst case would be 1.2
. In that case you'd have had 17 * 3 * (1 256 256*256) = 3355443
range lookups to do!
If they want this badly enough, you need to do a full text search on strings.
CodePudding user response:
Anything other than some pre-processing, indexing, caching in that case sounds too inefficient (and very hard to implement) to me.
Here are a few ideas:
- Look into creating a custom index, if possible, that enables you do string search.
- Add an automatic field to the table that represents the ip as a string and enable you to string search. Add a corresponding index of course.
- If you can't or don't want to change the schema of that table, create another one with string representations of the rows in your ip table and corresponding foreign keys that map to the primary keys of the ip table.
- If you don't want or can't edit that database at all, create an external key/value store/database where the keys are string representations of the ips and values hold the corresponding records of the ip table in the (now) external database or point to it.
In any case, I don't think searching in that table with its current form (integer) is feasible (both performance-wise and implementation complexity-wise) considering your requirements.