Home > Software design >  Compress big numbers into small numbers. Reversibile
Compress big numbers into small numbers. Reversibile

Time:09-10

I am looking for a compression algorithm for integers of 10 digits (ex: 5289412587), the aim would be to reduce the compressed result to a number with the fewest digits possible (ex: 125), an algorithm that is reversible, therefore from the result I should then reconstruct the original number, is there something that is right for me? Thank you.

CodePudding user response:

You can't compress all 10-digit numbers to something smaller, and be able to decompress them back to the originals. This should be obvious. If you compressed them, say, to 9-digit numbers, then when you decompressed all of those 9-digit numbers, of which there are 109, you could only get 1/10th of the original numbers, of which there are 1010. 90% of the original numbers would not be reproduced.

CodePudding user response:

No, and maybe.

Suppose the value you want to compress is represented by the variable n, and 0 <= n <= m. If the values of n are evenly distributed over the range, and all values in the range are possible, then it is not possible to have a lossless compression algorithm.

However, it might be that the possible values of n in your data are not evenly distributed. That is, some values are common, but others values are rare, but don't occur. If that is the case, there is a possibility you could use Huffman coding or something similar.

Huffman compression represents values using a variable number of bits for the values to be represented: The more common values are represented by fewer bits, the less common values by more bits.

One way to use Huffman coding is to sample your data, and create a "standard" table for use in all your coding and decoding of n. This can reduce processing time. If you don't create a "standard" table, you would build a table each time you compress by running through your data and looking at the values of n, count how many times each value occurs, and sort by number of occurrences. This requires more processing, but the table will be fine-tuned to that particular run. It also requires the resulting table be carried with the compressed data.

But, if your data doesn't have values that are more common than other values, Huffman coding won't be useful.

  • Related