Python pandas: Change value for each row with random string-CodePudding

I need to change a csv file in order to generate a random string value to each row:

This is my code by now:

patients = pd.read_csv("patients.csv")

# updating the column value/data
patients['VALOR_ID'] = patients['VALOR_ID'].str.replace('^(\w |)',generate_cip())
  
# writing into the file
patients.to_csv("patients-writer.csv", index=False)

The problem here is that all rows end up having same value.

Any ideas about how to apply generate_cip for each row?

Note:

I need to use replace since row value has this format:

GMCP0200611068|46977549A|81132941070

and I need to change only right before | string part.

For example:

NIF,CIP,FORMAT_ID,VALOR_ID
39999384T,MAMO28374657001,CIP|NASS,XXXXX|2343434|81132941070
39576383R,CACO56874934005,CIP|NASS,XXXXX|39283744|81132941070

My desired output is that XXXXX = generated_cip()

39999384T,MAMO28374657001,CIP|NASS,generated_cip()|2343434|81132941070
39576383R,CACO56874934005,CIP|NASS,generated_cip()|39283744|81132941070

Any ideas?

CodePudding user response：

Try:

import pandas as pd


# toy function
def generate_cip():
    import random
    from string import ascii_uppercase
    return "XXX"   "".join(random.sample(ascii_uppercase, 2))


# toy data
patients = pd.DataFrame.from_dict(
    {'NIF': {0: '39999384T', 1: '39576383R'}, 'CIP': {0: 'MAMO28374657001', 1: 'CACO56874934005'},
     'FORMAT_ID': {0: 'CIP|NASS', 1: 'CIP|NASS'},
     'VALOR_ID': {0: 'XXXXX|2343434|81132941070', 1: 'XXXXX|39283744|81132941070'}})

patients['VALOR_ID'] = patients['VALOR_ID'].str.replace('^(\w |)', lambda x: generate_cip(), regex=True)
print(patients)

Output

         NIF              CIP FORMAT_ID                    VALOR_ID
0  39999384T  MAMO28374657001  CIP|NASS   XXXVH|2343434|81132941070
1  39576383R  CACO56874934005  CIP|NASS  XXXKB|39283744|81132941070

The repl argument of Series.str.replace can be a callable, from the documentation:

repl str or callable Replacement string or a callable. The callable is passed the regex match object and must return a replacement string to be used. See re.sub().

CodePudding user response：

I think you want to replace the 'XXXXX' with the respective CIP s. You can try this solution:

import pandas as pd

# toy data
patients = pd.DataFrame.from_dict(
    {'NIF': {0: '39999384T', 1: '39576383R'}, 'CIP': {0: 'MAMO28374657001', 1: 'CACO56874934005'},
     'FORMAT_ID': {0: 'CIP|NASS', 1: 'CIP|NASS'},
     'VALOR_ID': {0: 'XXXXX|2343434|81132941070', 1: 'XXXXX|39283744|81132941070'}})

#Try: 

for index,row in patients.iterrows():
    cip = row['CIP']
    row['VALOR_ID'] = row['VALOR_ID'].replace('XXXXX',cip)

Output:

         NIF              CIP FORMAT_ID                              VALOR_ID
0  39999384T  MAMO28374657001  CIP|NASS   MAMO28374657001|2343434|81132941070
1  39576383R  CACO56874934005  CIP|NASS  CACO56874934005|39283744|81132941070