I need to change a csv file in order to generate a random string value to each row:
This is my code by now:
patients = pd.read_csv("patients.csv")
# updating the column value/data
patients['VALOR_ID'] = patients['VALOR_ID'].str.replace('^(\w |)',generate_cip())
# writing into the file
patients.to_csv("patients-writer.csv", index=False)
The problem here is that all rows end up having same value.
Any ideas about how to apply generate_cip
for each row?
Note:
I need to use replace
since row value has this format:
GMCP0200611068|46977549A|81132941070
and I need to change only right before |
string part.
For example:
NIF,CIP,FORMAT_ID,VALOR_ID
39999384T,MAMO28374657001,CIP|NASS,XXXXX|2343434|81132941070
39576383R,CACO56874934005,CIP|NASS,XXXXX|39283744|81132941070
My desired output is that XXXXX
= generated_cip()
39999384T,MAMO28374657001,CIP|NASS,generated_cip()|2343434|81132941070
39576383R,CACO56874934005,CIP|NASS,generated_cip()|39283744|81132941070
Any ideas?
CodePudding user response:
Try:
import pandas as pd
# toy function
def generate_cip():
import random
from string import ascii_uppercase
return "XXX" "".join(random.sample(ascii_uppercase, 2))
# toy data
patients = pd.DataFrame.from_dict(
{'NIF': {0: '39999384T', 1: '39576383R'}, 'CIP': {0: 'MAMO28374657001', 1: 'CACO56874934005'},
'FORMAT_ID': {0: 'CIP|NASS', 1: 'CIP|NASS'},
'VALOR_ID': {0: 'XXXXX|2343434|81132941070', 1: 'XXXXX|39283744|81132941070'}})
patients['VALOR_ID'] = patients['VALOR_ID'].str.replace('^(\w |)', lambda x: generate_cip(), regex=True)
print(patients)
Output
NIF CIP FORMAT_ID VALOR_ID
0 39999384T MAMO28374657001 CIP|NASS XXXVH|2343434|81132941070
1 39576383R CACO56874934005 CIP|NASS XXXKB|39283744|81132941070
The repl argument of Series.str.replace
can be a callable, from the documentation:
repl str or callable Replacement string or a callable. The callable is passed the regex match object and must return a replacement string to be used. See re.sub().
CodePudding user response:
I think you want to replace the 'XXXXX' with the respective CIP s. You can try this solution:
import pandas as pd
# toy data
patients = pd.DataFrame.from_dict(
{'NIF': {0: '39999384T', 1: '39576383R'}, 'CIP': {0: 'MAMO28374657001', 1: 'CACO56874934005'},
'FORMAT_ID': {0: 'CIP|NASS', 1: 'CIP|NASS'},
'VALOR_ID': {0: 'XXXXX|2343434|81132941070', 1: 'XXXXX|39283744|81132941070'}})
#Try:
for index,row in patients.iterrows():
cip = row['CIP']
row['VALOR_ID'] = row['VALOR_ID'].replace('XXXXX',cip)
Output:
NIF CIP FORMAT_ID VALOR_ID
0 39999384T MAMO28374657001 CIP|NASS MAMO28374657001|2343434|81132941070
1 39576383R CACO56874934005 CIP|NASS CACO56874934005|39283744|81132941070