I need a solution where i can generate unique alphanumeric id column for my dataframe. I need that the ids remain same even if I run the script later in future.
Name
Sam
Pray
Brad
I can generate the ids based on this post but I need 5 digit aplhanumeric values which will always remain same.
This is desired output:
Name ID
Sam X25TR
Peter WE558
Pepe TR589
CodePudding user response:
One way would be to generate a hash of the name, by whatever hashing algorithm, and keep the first five characters of the hash. But you should keep in mind that with such as short hash this is likely to cause collisions (same output for multiple different inputs) if you have enough data.
Something along these lines:
import hashlib
def get_id(name: str) -> str:
hash = hashlib.md5(name.encode())
return hash.hexdigest()[:5]
Now for a given input string, get_id
returns an alphanumeric 5-character string which is always the same for the same input.
CodePudding user response:
This function generate random alphanumeric string with given length:
import math
import secrets
def random_alphanum(length: int) -> str:
text = secrets.token_hex(nbytes=math.ceil(length / 2))
isEven = length % 2 == 0
return text if isEven else text[1:]
df['ID'] == random_alphanum(5)
Apply to whole rows:
df2['ID'] = df2.apply(lambda x: random_alphanum(5), axis=1, result_type="expand")
CodePudding user response:
Here's my attempt
import secrets
ids = []
while len(ids) < df.shape[0]:
temp = secrets.token_hex(5)[:5]
if temp not in ids:
ids.append(temp)
df.merge(pd.DataFrame(ids).reset_index(), left_on = df.groupby(['Name']).ngroup(), right_on = 'index')