Home > Software engineering >  What data-structure to use for a use-case "this string exists/doesn't exist in a mutable,
What data-structure to use for a use-case "this string exists/doesn't exist in a mutable,

Time:01-06

I need a way to persist strings, and then have an interface to later check whether a string is in that list or not.

I don't need any metadata, nor any filters, ordering or such. Having the list (re)ordered on insert is fine. I'll be doing many more lookups than inserts, so prefer a structure that is optimized for the lookups. I'll be looking up far more items that don't exist, than items that do. So a data-structure that can answer "it doesn't exist" without traversing the entire set, is preferred. I have no idea if this is even possible.

A pseudo-codish interface would look like:

let word_repo = WordRepo.new()
word_repo.insert("hello")
word_repo.insert("world")

word_repo.contains("hello") => true
word_repo.contains("mars") => false

How is such a data-structure called? Or what would such a simple "database" be called, conceptually?

I'd prefer to have it embedded, simple file on disk, not in a separate database-server (like redis), and plan to include it in a rust project. I currently (ab)use a file-based key-value store, but that is a bit overkill, since I really only need the "keys" and never the values.

CodePudding user response:

Do you have removing/replacing methods or only inserts?

The simplest way: a hashtable with keys. It's mutable and it has a "contains"-method. (okay, there are dozens of trees/tries/custom data structures for that, not only hashtables)

The next level: Bloom Filter (https://en.wikipedia.org/wiki/Bloom_filter) persisted storage (hash-table?).

So, Bloom Filter is probabilistic and pretty small. You can have it in memory. In case of probable-cache-hit, you have to check it at persisted storage.

Bloom Filter doesn't support removing, but Cuckoo Filter does https://en.wikipedia.org/wiki/Cuckoo_filter (the proper implementation/library is more challenging than Bloom)

  • Related