Weird interaction with Python PIL image.save quality parameter-CodePudding

This is just a part of the project I'm currently working on. I am trying to convert picture into text, then from text back to the picture without any loss or extra size. First, I open the picture, read the pixels, and write them down. Pictures are size NxN.

from PIL import Image

import sys 
import zlib 

def rgb_to_hex(rgb):
    return 'xxx' % rgb

N = im.width

im = Image.open(r"path\\pic.png")
px = im.load()
read_pixels = ""
for i in range(N):
    for j in range(N):
        read_pixels  = rgb_to_hex(px[j, i,])

Then, transform the string into bytes.

data = bytes.fromhex(read_pixels)
img = Image.frombytes("RGB", (N,N), data)
img.save("path\\new.png",quality = 92)

According to the Pillow official documentation they are saying that quality goes from 0 - 100 and values over 95 should be avoided. If there is nothing set, the default value is 75.

For example I used this picture. The original photo when downloaded takes up 917 KB. When the picture is converted by the program, the new picture takes up 911 KB. Then I take my new picture (911KB) and run that one by the same program and I get back the same size 911KB this one did not shrink by a few KB and I do not know why. Why does this weird interaction happen only when I put original picture of 917 KB? Is there a way I could get 100% of the original quality.

I also tried this on some random 512x512 .jpg picture. Original size of that picture is 67.4KB, next "generation" of that picture is 67.1KB and one after that is 66.8KB. Also if I change quality to 93 or above (when using .jpg) the size goes up by a lot (at quality = 100, size > 135KB). I was 'playing' around with quality value and found out closest to the same size is 92 (<93 puts some extra KB for .jpg).

So with quality 92 .PNG the size stays the same after the first "generation" but with .jpg the size (and potentially quality) goes down.

Is there something I am missing in my code? My best guess is that .PNG stores some extra information about the picture which is lost in the conversion, but not sure why the .jpg pictures decrease in size every generation. I tried putting 92.5 quality but the function does not accept decimal numbers as parameters.

CodePudding user response：

Quick takeaways from the following explanations...

The quality parameter for PIL.Image.save isn't used when saving PNGs.
JPEG is generationally-lossy so as you keep re-saving images, they will likely degrade in quality because the algorithm will introduce more artifacting (among other things)
PNG is lossless and the file size differences you're seeing are due to PIL stripping metadata when you re-save your image.

Let's look at your PNG file first. PNG is a lossless format - the image data you give it will not suffer generational loss if you were to open it and re-save it as PNG over and over again.

The quality parameter isn't even recognized by the PNG plugin to PIL - if you look at the PngImagePlugin.py/PngStream._save method it is never referenced in there.

What's happening with your specific sample image is that Pillow is dropping some metadata when you re-save it in your code.

On my test system, I have your PNG saved as sample.png, and I did a simple load-and-save with the following code and save it as output.png (inside ipython)

In [1]: from PIL import Image
In [2]: img = Image.open("sample.png")
In [3]: img.save("output.png")

Now let's look at the differences between their metadata with ImageMagick:

#> diff <(magick identify -verbose output.png) <(magick identify -verbose sample.png)

7c7,9
<   Units: Undefined
---
>   Resolution: 94.48x94.48
>   Print size: 10.8383x10.8383
>   Units: PixelsPerCentimeter
74c76,78
<   Orientation: Undefined
---
>   Orientation: TopLeft
>   Profiles:
>     Profile-exif: 5218 bytes
76,77c80,81
<     date:create: 2022-08-12T21:27:13 00:00
<     date:modify: 2022-08-12T21:27:13 00:00
---
>     date:create: 2022-08-12T21:23:42 00:00
>     date:modify: 2022-08-12T21:23:31 00:00
78a83,85
>     exif:ImageDescription: IMGP5493_seamless_2.jpg
>     exif:ImageLength: 1024
>     exif:ImageWidth: 1024
84a92
>     png:pHYs: x_res=9448, y_res=9448, units=1
85a94,95
>     png:text: 1 tEXt/zTXt/iTXt chunks were found
>     png:text-encoded profiles: 1 were found
86a97
>     unknown: nomacs - Image Lounge 3.14
90c101
<   Filesize: 933730B
---
>   Filesize: 939469B
93c104
<   Pixels per second: 42.9936MP
---
>   Pixels per second: 43.7861MP

You can see there are metadata differences - PIL didn't retain some of the information when re-saving the image, especially some exif properties (you can see this PNG was actually converted from a JPG and the EXIF metadata was preserved in the conversion).

However, if you re-save the image with original image's info data...

In [1]: from PIL import Image
In [2]: img = Image.open("sample.png")
In [3]: img.save("output-with-info.png", info=img.info)

You'll see that the two files are exactly the same again:

❯ sha256sum output.png output-with-info.png
37ad78a7b7000c9430f40d63aa2f0afd2b59ffeeb93285b12bbba9c7c3dec4a2  output.png
37ad78a7b7000c9430f40d63aa2f0afd2b59ffeeb93285b12bbba9c7c3dec4a2  output-with-info.png

Maybe Reducing PNG File Size

While lossless, the PNG format does allow for reducing the size of the image by specifying how aggressive the compression is (there are also more advanced things you could do like specifying a compression dictionary).

PIL exposes these options as optimize and compress_level under PNG options.

optimize

    If present and true, instructs the PNG writer to make the
    output file as small as possible. This includes extra 
    processing in order to find optimal encoder settings.

compress_level

    ZLIB compression level, a number between 0 and 9: 1 gives
    best speed, 9 gives best compression, 0 gives no 
    compression at all. Default is 6. When optimize option is 
    True compress_level has no effect (it is set to 9 regardless
    of a value passed).

And seeing it in action...

from PIL import Image

img = Image.open("sample.png")
img.save("optimized.png", optimize=True)

The resulting image I get is about 60K smaller than the original.

❯ ls -lh optimized.png sample.png
-rw-r--r-- 1 wkl staff 843K Aug 12 18:10 optimized.png
-rw-r--r-- 1 wkl staff 918K Aug 12 17:23 sample.png

JPEG File

Now, JPEG is a generationally-lossy image format - as you save it over and over, you will keep losing quality - it doesn't matter if your subsequent generations save it at even higher qualities than the previous ones, you've lost data already from the previous saves.

Note that the likely reason why you saw file sizes balloon if you used quality=100 is because libjpeg/libjpeg-turbo (which are the underlying libraries used by PIL for JPEG) do not do certain things when the quality is set that high, I think it doesn't do quantization which is an important step in determining how many bits are needed to compress.