Home > other >  Why is my random binary data generated in JavaScript highly compressible?
Why is my random binary data generated in JavaScript highly compressible?

Time:12-08

My Goal is to generate Incompressible data in Javascript like /dev/urandom

I used the following command from my Mac and i got 31.5 Mb of data that is not compressible using ZIP or RAR.

dd if=/dev/urandom of=file.txt bs=1048576 count=30 

But when i try to produce Incompressible data in Javascript like /dev/urandom

It is very easy to compress in WinRAR. WinRAR compressed to less than 1Mb. In Zip it is getting compressed to 30Mb

Here is my code

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta http-equiv="X-UA-Compatible" content="IE=edge">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>How to generate Incompressible data in Javascript like /dev/urandom</title>
</head>
<body>
    <script>
var ulDataSize = 30;  //Mb
var mData = new Float64Array(131072);
var n = mData.length;
        for (var i = 0; i < n; i  )
        {
            mData[i] = Math.random()*9;
        }
var uploadData = [];
        for (var i = 0; i < ulDataSize; i  ) uploadData.push(mData);
        uploadData = new Blob(uploadData, { type: "application/octet-stream" });

function download(text, name, type) {
  var a = document.getElementById("a");
  //var file = new Blob([text], {type: type});
  a.href = URL.createObjectURL(uploadData);
  a.download = name;
}

    </script>


<a href="" id="a">click here to Download Incompressible data</a>
<button onclick="download('file text', 'Incompressibledata.txt', 'text/plain')"> Generate Data</button>

</body>
</html>

CodePudding user response:

Aside from the fact you're repeating the same data 30 times (which is inherently non-random), I don't think Math.random() * 9 produces values across the full range of bit patterns that an element in a Float64Array can hold (though my binary floating point knowledge isn't quite up to supporting that doubt with data), which means more bit patterns will be repeated, reducing the randomness.

To be sure, I'd probably use a Uint8Array with values in the full range 0 to 255, since then I'm sure I'm using all bit patterns; something like this:

function download(name) {
    const blocks = Array.from(
        {length: 30},
        () => Uint8Array.from(
            {length: 1024 * 1024},
            () => Math.floor(Math.random() * 256)
        )
    );
    const blob = new Blob(blocks, {type: "application/octet-stream"});
    const a = document.getElementById("a");
    a.href = URL.createObjectURL(blob);
    a.download = name;
}

Note that I've removed the unused name and type parameters.

When I use that, gzip (the tool I have handy) doesn't reduce the size of the resulting file (it goes from 31,457,280 bytes to 31,462,121 bytes).

That's a lot of calls to Math.random(), but we're basically guaranteed that we fully explore the available bit patterns. You might be able to reduce the number of calls by using a Uint16Array or a Uint32Array. For instance:

const blockSize = (1024 * 1024) / 4;
const blocks = Array.from(
    {length: 30},
    () => Uint32Array.from(
        {length: blockSize},
        () => Math.random() * 4294967296
    )
);

For me, gzip still couldn't do anything with it (same result as above), but it took just over a quarter of the time to generate.

Using a Float64Array did not work (for me), apparently Math.random doesn't explore the full range of bit patterns possible in a 64-bit IEEE-754 binary floating point number and/or the format has (probably small) unused ranges. So 32-bit ints may be the best compromise.

  • Related