Home > Mobile >  How to have a randomly generated unique ID in multiple rows in pandas?
How to have a randomly generated unique ID in multiple rows in pandas?

Time:04-28

I have tried to wrap my head around this but I am at a roadblock. I have a script here that creates a data frame with a pre-defined requestTypes and a randomly hashed u_id. I would like to edit the data_generator function so that the u_id can have multiple request types.

requestTypes = ["type1","type2","type3","type"]

def rowValue(x, y):
    return {'request_type': x,
            'u_id': y
            }

def data_generator():
    temp_list = []
    for i in range(0, 5):
        temp_requestType = random.choice(requestTypes)
        uid_value = hashlib.sha256((str(i)).encode('UTF-8')).hexdigest()

        temp_list.append(rowValue(temp_requestType, uid_value))

    return pd.DataFrame(temp_list)

df = data_generator()
df

This is the result I am getting;

    request_type    u_id
0   type1           6cd5b6e51936a442b973660c21553dd22bd72ddc875113...
1   type2           b17ef6d19c7a5b1ee83b907c595526dcb1eb06db8227d6...
2   type3           535fa30d7e25dd8a49f1536779734ec8286108d115da50...
3   type2           0b918943df0962bc7a1824c0555a389347b4febdc7cf9d...
4   type3           73475cb40a568e8da8a045ced110137e159f890ac4da88...

This is the result I want;

    request_type    u_id
0   type1           6cd5b6e51936a442b973660c21553dd22bd72ddc875113...
1   type2           b17ef6d19c7a5b1ee83b907c595526dcb1eb06db8227d6...
2   type3           6cd5b6e51936a442b973660c21553dd22bd72ddc875113...
3   type2           0b918943df0962bc7a1824c0555a389347b4febdc7cf9d...
4   type3           0b918943df0962bc7a1824c0555a389347b4febdc7cf9d...

CodePudding user response:

If you just want to generate some random UUID for each request type, you can try this:

from uuid import uuid4

requestTypes = ["type1","type2","type3","type5"]
hashes = [uuid4() for _ in range(len(requestTypes))]

t = np.random.choice(len(requestTypes), 5)
df = pd.DataFrame({
    "request_type": np.array(requestTypes)[t],
    "u_id": np.array(hashes)[t]
})

If you want to use your original hashing algorithm:

import hashlib

requestTypes = ["type1","type2","type3","type5"]
hashes = [hashlib.sha256((str(i)).encode('UTF-8')).hexdigest() for i in range(len(requestTypes))]

t = np.random.choice(len(requestTypes), 5)
df = pd.DataFrame({
    "request_type": np.array(requestTypes)[t],
    "u_id": np.array(hashes)[t]
})
  • Related