Home > Software design >  How to add unique alphanumeric id for pandas dataframe?
How to add unique alphanumeric id for pandas dataframe?

Time:10-08

I need a solution where i can generate unique alphanumeric id column for my dataframe. I need that the ids remain same even if I run the script later in future.

    Name
    Sam
    Pray
    Brad

I can generate the ids based on this post but I need 5 digit aplhanumeric values which will always remain same.

This is desired output:

    Name         ID
    Sam          X25TR
    Peter        WE558
    Pepe         TR589

CodePudding user response:

One way would be to generate a hash of the name, by whatever hashing algorithm, and keep the first five characters of the hash. But you should keep in mind that with such as short hash this is likely to cause collisions (same output for multiple different inputs) if you have enough data.

Something along these lines:

import hashlib

def get_id(name: str) -> str:
    hash = hashlib.md5(name.encode())
    return hash.hexdigest()[:5]

Now for a given input string, get_id returns an alphanumeric 5-character string which is always the same for the same input.

CodePudding user response:

This function generate random alphanumeric string with given length:

import math
import secrets


def random_alphanum(length: int) -> str:
        text = secrets.token_hex(nbytes=math.ceil(length / 2))
        isEven = length % 2 == 0
        return text if isEven else text[1:]

df['ID'] == random_alphanum(5)

Apply to whole rows:

df2['ID'] = df2.apply(lambda x: random_alphanum(5), axis=1, result_type="expand")

CodePudding user response:

Here's my attempt

import secrets

 ids = []
 while len(ids) < df.shape[0]:
     temp = secrets.token_hex(5)[:5]
     if temp not in ids:
         ids.append(temp)
         

df.merge(pd.DataFrame(ids).reset_index(), left_on = df.groupby(['Name']).ngroup(), right_on =  'index')
  • Related