PostgreSQL bytea network traffic double expected value-CodePudding

I'm investigating a bandwidth problem and stumbled on an issue with retrieving a bytea value. I tested this with PostgreSQL 10 and 14, the respective psql clients and the psycopg2 client library.

The issue is that if the size of a bytea value is eg. 10 MB (I can confirm by doing select length(value) from table where id=1), and I do select value from table where id=1, then the amount of data transferred over the socket is about 20MB. Note that the value in the database is pre-compressed (so high entropy), and the table is set to not compress the bytea value to avoid double work.

I can't find any obvious encoding issue since it's all just bytes. I can understand that the psql CLI command may negotiate some encoding so it can print the result, but psycopg2 definitely doesn't do that, and I experience the same behaviour.

I tested the same scenario with a text field, and that nearly worked as expected. I started with copy paste of lorem ipsum and it transferred the correct amount of data, but when I changed the text to be random extended ASCII values (higher entropy again), it transferred more data than it should have. I have compression disabled for all my columns so I don't understand why that would happen.

Any ideas as to why this would happen?

CodePudding user response：

That is normal. By default, values are transferred as strings, so a bytea would be rendered in hexadecimal numbers, which doubles its size.

As a workaround, you could transfer such data in binary mode. The frontend-backend protocol and the C library offer support for that, but it will depend on your client API whether you can make use of that or not.