I'm investigating a bandwidth problem and stumbled on an issue with retrieving a bytea value. I tested this with PostgreSQL 10 and 14, the respective psql clients and the psycopg2 client library.
The issue is that if the size of a bytea value is eg. 10 MB (I can confirm by doing select length(value) from table where id=1
), and I do select value from table where id=1
, then the amount of data transferred over the socket is about 20MB. Note that the value in the database is pre-compressed (so high entropy), and the table is set to not compress the bytea value to avoid double work.
I can't find any obvious encoding issue since it's all just bytes. I can understand that the psql CLI command may negotiate some encoding so it can print the result, but psycopg2 definitely doesn't do that, and I experience the same behaviour.
I tested the same scenario with a text field, and that nearly worked as expected. I started with copy paste of lorem ipsum and it transferred the correct amount of data, but when I changed the text to be random extended ASCII values (higher entropy again), it transferred more data than it should have. I have compression disabled for all my columns so I don't understand why that would happen.
Any ideas as to why this would happen?
CodePudding user response:
That is normal. By default, values are transferred as strings, so a bytea
would be rendered in hexadecimal numbers, which doubles its size.
As a workaround, you could transfer such data in binary mode. The frontend-backend protocol and the C library offer support for that, but it will depend on your client API whether you can make use of that or not.