Home > Software design >  SQL: String column to unique integer?
SQL: String column to unique integer?

Time:04-08

I have a table place2022 which has a very long CHAR column

    timestamp    |                                         user_id                                          | pixel_color | coordinate 
----------------- ------------------------------------------------------------------------------------------ ------------- ------------
 17:38:20.021 00 | p0sXpmkcmg1KLiCdK5e4xKdudb1f8cjscGs35082sKpGBfQIw92nZ7yGvWbQ/ggB1 kkRBaYu1zy6n16yL/yjA== | #FF4500     | 371,488
 17:38:20.024 00 | Ctar52ln5JEpXT tVVc8BtQwm1tPjRwPZmPvuamzsZDlFDkeo3 ItUW89J1rXDDeho6A4zCob1MKmJrzYAjipg== | #51E9F4     | 457,493
 17:38:20.025 00 | rNMF5wpFYT2RAItySLf9IcFZwOhczQhkRhmTD4gv0K78DpieXrVUw8T/MBAZjj2BIS8h5exPISQ4vlyzLzad5w== | #000000     | 65,986
 17:38:20.025 00 | u0a7l8hHVvncqYmav27EARAE6ciLtpUTPXMI33lDrUmtj5Ei3ixlfRuG28KUvs7r5LpeiE/iOKPALVjkILhrYg== | #3690EA     | 73,961

The user_ids are already hashes, so all I really care about here is having some sort of id column which is 1-1 with the user_id.

I've counted the number of unique user_ids, which is 10381163, which fits into 24 bits. Therefore, I can compress the id field down to a 32-bit integer using the obvious scheme of "Assign 1 to the first new user_id you see, 2 to the second new user_id you see", etc. I don't even care that the user_id's are mapped in the order that they're seen: I just need them to be mapped in an invertible manner to 32-bit ints somehow. I'd also like to persist this mapping somewhere so that, if I want to, I can go backwards.

What would be the best way to achieve this? I imagine that we could create a new table (create table place2022_user_ids as select distinct(user_id) from place2022;?) and then reverse-lookup the user_id column in that table, but I don't know quite how to formulate the queries and also make sure that I'm not doing something ridiculously slow.

I am using postgresql, if it matters.

CodePudding user response:

If you have a recent (>8) version of Postgres you can add an auto increment id column to an existing table.

ALTER TABLE place2022 
ADD COLUMN id SERIAL PRIMARY KEY;

NB If the existing column is a PRIMARY KEY you will need to drop it first.
See drop primary key constraint in postgresql by knowing schema and table name only

  • Related