Home > Software engineering >  Is gzip compression efficient for repetitive text content? Multiple terms that show up multiple time
Is gzip compression efficient for repetitive text content? Multiple terms that show up multiple time

Time:10-02

I have this products-like page that I'm using SSR with NextJS. Users can search, filter, pagination, etc.

It's not a huge amount of items, so I decided to send them all as props to the page, doing all that functionality on client-side only. Raw props data is now roughly ~350kb and gzipped is ~75kb.

I think this is worth it, because I can save a lot on database reads, and I didn't have to setup any search/cache server to implement the searching/filtering functionality. It's all done on the client, becuase it has all the items in-memory.

The product object has a shape similar to this:

{
  longPropertyName: value,
  stringEnum: 'LONG_STRING_VALUE'
}

What I could do to optmize data traffic would be to shorten property names, and refactor string enums into numeric enums, so it would be:

{
  shortProp: value,
  numericEnum: 1
}

This for sure would reduce the uncompressed data size from ~350kb to maybe ~250kb.

But I'm not sure if it's worth doing it, because I think the gzipped size would remain very much the same, because I'm assuming the gzip compression is very good in compressing repetitive text content, like property names and string enum values that show up multiple times in the data.

Would I get a reduction on the gzip size by the same factor? Or will it be just the same value?

CodePudding user response:

gzip will find repeated strings that are no farther than 32K bytes from each other. It will encode up to 258 bytes of match at a time as a very compact length and distance. That is well suited to your application, whether or not you tokenize the information. Tokenization will improve the compression, both by making the matches shorter, permitting more text per match, but also by supporting more distant matches by bringing what was further away into that 32K.

You can also try more modern compressors such as zstd or lzma2 (xz), which support much farther distances and longer matches.

CodePudding user response:

Just tested it locally and while I was able to remove 20kb from uncompressed data by replacing the string enums with numeric enums, the change in the compressed data was only 1.1kb.

It means that gzip is already doing 95% of the work for me.

Note: I tested the gzip size with:

gzip -c filename.min.js | wc -c

I ended up writing a minification/expand logic for those objects. I'm doing SSR, so I minify on the server, and expand on the client. That reduced the uncompressed props data from 350kb to 168kb and the final compressed page size from 75kb to 45kb.

Simple logic like:

On server:

{ longProp: value } => { a: value }
{ stringEnum: STRING_VALUE => { b: 5 }

On client:

{ a: value } => { longProp: value }
{ b: 5 } => { stringEnum: STRING_VALUE
  • Related