I'm creating masked data from our production database and part of that process is masking all the UUIDs with the function uuid_generate_v5(UUID, "namespace")
. This helps create masked data with valid UUIDs as well as making them be able to generate the same each time across all the tables of our database.
It has the added benefit of allowing me to go from a production UUID to a masked ID for testing.
What I'm not sure about is if this function is reversible. Can I go from my masked UUID back to the production ID. My thought (and hope) is that I can't.
Example:
SELECT uuid_generate_v5('0015b178-6102-43bc-89a8-5f7cafe78f2f', 'mask');
-- will return: 'fdb8a7dd-909b-544d-8340-41358c6b3495'
Can I go from the value 'fdb8a7dd-909b-544d-8340-41358c6b3495'
back to '0015b178-6102-43bc-89a8-5f7cafe78f2f'
? using the uuid_generate_v5() function?
CodePudding user response:
I'm not sure I can put this better than the Wikipedia article on UUIDs:
Version-3 and version-5 UUIDs are generated by hashing a namespace identifier and name. Version 3 uses MD5 as the hashing algorithm, and version 5 uses SHA-1.
...
Version-3 and version-5 UUIDs have the property that the same namespace and name will map to the same UUID. However, neither the namespace nor name can be determined from the UUID, even if one of them is specified, except by brute-force search.
In other words, the answer is the same as "Is it possible to reverse a SHA-1 hash? - no (at least by design; a flaw may eventually be found in the SHA-1 algorithm that makes it more feasible).
This is confirmed by the PostgreSQL manual for the function you're using:
uuid_generate_v3
... The name parameter will be MD5-hashed, so the cleartext cannot be derived from the generated UUID.
uuid_generate_v5
... Generates a version 5 UUID, which works like a version 3 UUID except that SHA-1 is used as a hashing method.
However, your use of the function is not really what is intended, because you have mixed up the two parameters; the signature is documented as:
uuid_generate_v3 ( namespace uuid, name text ) → uuid
It is the UUID passed first which is the "namespace", with pre-defined values that can identify that the name used was a DNS name, URL, etc. The second parameter is not a "namespace" or "mask", but "an identifier in the selected namespace".
So, while your method will work, what you're doing is treating every input UUID as a namespace, and creating a UUID for the same name in each of them. The intended use is that you use a fixed UUID as the namespace, and the value you're masking (which happens to be a UUID in string form) as the name.
SELECT uuid_generate_v5(somehow_get_mask_namespace_uuid(), '0015b178-6102-43bc-89a8-5f7cafe78f2f');