I have a table place2022
which has a very long CHAR
column
timestamp | user_id | pixel_color | coordinate
----------------- ------------------------------------------------------------------------------------------ ------------- ------------
17:38:20.021 00 | p0sXpmkcmg1KLiCdK5e4xKdudb1f8cjscGs35082sKpGBfQIw92nZ7yGvWbQ/ggB1 kkRBaYu1zy6n16yL/yjA== | #FF4500 | 371,488
17:38:20.024 00 | Ctar52ln5JEpXT tVVc8BtQwm1tPjRwPZmPvuamzsZDlFDkeo3 ItUW89J1rXDDeho6A4zCob1MKmJrzYAjipg== | #51E9F4 | 457,493
17:38:20.025 00 | rNMF5wpFYT2RAItySLf9IcFZwOhczQhkRhmTD4gv0K78DpieXrVUw8T/MBAZjj2BIS8h5exPISQ4vlyzLzad5w== | #000000 | 65,986
17:38:20.025 00 | u0a7l8hHVvncqYmav27EARAE6ciLtpUTPXMI33lDrUmtj5Ei3ixlfRuG28KUvs7r5LpeiE/iOKPALVjkILhrYg== | #3690EA | 73,961
The user_id
s are already hashes, so all I really care about here is having some sort of id
column which is 1-1 with the user_id
.
I've counted the number of unique user_id
s, which is 10381163
, which fits into 24 bits. Therefore, I can compress the id
field down to a 32-bit integer using the obvious scheme of "Assign 1 to the first new user_id you see, 2 to the second new user_id you see", etc. I don't even care that the user_id's are mapped in the order that they're seen: I just need them to be mapped in an invertible manner to 32-bit ints somehow. I'd also like to persist this mapping somewhere so that, if I want to, I can go backwards.
What would be the best way to achieve this? I imagine that we could create a new table (create table place2022_user_ids as select distinct(user_id) from place2022;
?) and then reverse-lookup the user_id
column in that table, but I don't know quite how to formulate the queries and also make sure that I'm not doing something ridiculously slow.
I am using postgresql, if it matters.
CodePudding user response:
If you have a recent (>8) version of Postgres you can add an auto increment id column to an existing table.
ALTER TABLE place2022
ADD COLUMN id SERIAL PRIMARY KEY;
NB If the existing column is a PRIMARY KEY you will need to drop it first.
See drop primary key constraint in postgresql by knowing schema and table name only