Compressed Sqlite database and indexing-CodePudding

I read that it is possible to compress sqlite databases with extensions like sqlite-zstd.

In Sqlite3, are there methods, ready to use in Python, that allow both:

compression of a text column (let's say 1 billion of rows, with at least one text column of < 20 characters)
keep the fast lookup that we have when this column has an INDEX (and even full text search e.g. LIKE 'foo%')

I was about to write some code with LZ4-compressed rows, but then a single search/lookup would require a full scan (to decompress all values to see if there is match).

Are there Sqlite techniques adapted to this (or other data structures)?

CodePudding user response：

SQLITE doesn't compress it's data bydefault, but they do have some Proprietary SQLite Extensions that can do the required work. ZIPVFS is one such extension/addon that can allow you to read and write the compressed data using zlib or any other application-supplied compression and decompression functions.

CodePudding user response：

So, as the simplest solution is a trie of separately compressed SQLite or parquet files, but this is a common enough problem that there is likely a file format built for it.

The biggest objective is to maintain data locality across prefixes and the data is changing and unsorted. If you just compress a single SQLite db, then every insert causes the compression engine to accommodate two shifts in the data. Over at the end of the data table, but the other might be in the middle of the btree itself.

If your proposed table is why you have a shadow primary key. Why is that necessary for? It seems clear your text is a primary key, why not use it as such?