Home > OS >  Serialising compressed data to JSON produces a larger outcome than uncompressed
Serialising compressed data to JSON produces a larger outcome than uncompressed

Time:05-24

I have some code which takes a string and if it is above a given threshold, it compresses it. The result is then serialised to JSON and converted into an Azure Service Bus Message object.

The problem is that the compressed version is coming out at larger than it would have been had I left it uncompressed.

// The original Payload.
string originalPayload = m.Payload;
int originalInBytes = Encoding.UTF8.GetByteCount(originalPayload); // 259122 bytes.
string originalSerialised = JsonConvert.SerializeObject(originalPayload);
int originalSerialisedInBytes = Encoding.UTF8.GetByteCount(originalSerialised); // 259182 bytes.

// The compressed Payload.
byte[] compressedPayload = _compressionService.Zip(m.Payload); // 195845 bytes. Reduced as expected.
string compressedSerialised = JsonConvert.SerializeObject(compressedPayload);
int compressedSerialisedInBytes = Encoding.UTF8.GetByteCount(compressedSerialised); // 261130 bytes. Greater than what we started with.

// Message created using the original payload. Message.Size = 259705.
m.Payload = originalPayload;
m.CompressedPayload = null;
Message origPayload = GetServiceBusMessage(m, delay, false, customProperties);

// Message created using the compressed payload. Message.Size = 261651.
m.Payload = string.Empty;
m.CompressedPayload = compressedPayload;
Message compPayload = GetServiceBusMessage(m, delay, false, customProperties);

bool working = origPayload.Size > compPayload.Size; // False. Should be true.

Can anyone tell me what is causing this issue? It seems to be the SerializeObject call which is causing the discrepancy, but I am not clear on why this is.

CodePudding user response:

Firstly, you shouldn't expect compression to always reduce the size. If that were always true, you could keep applying compression recursively and end up with an empty result which could be decompressed to "anything at all".

But the real problem is that you're taking JSON (inherently text), which ends up with an "arbitrary binary data" result, which you're then serializing - and arbitrary binary data is serialized in JSON as base64, which increases the size by 1/3. (Every three bytes of data ends up as four characters of base64.)

Given that your compression only reduced the size by ~1/4, the overall result is a larger JSON payload. Nothing is wrong here other than expectations.

  • Related