Home > Software engineering >  how to select a particular string from a column value and assign it to a new column in pandas
how to select a particular string from a column value and assign it to a new column in pandas

Time:03-30

docs = [doc1, doc2, doc 3]

nurses = [nur1, nur2, nur3]

1st responder Associates
doc1 nur1, nur2
nur1 doc1, nur2
nur3 nur1, nur2, doc2
doc2 nur2, nur1

I want to add a new column 'Doctor' and select the doctor from either '1st responder' or from 'Associates' such that the resultant df is like:

1st responder Associates Doctor
doc1 nur1, nur2 doc1
nur1 doc1, nur2 doc1
nur3 nur1, nur2, doc2 doc2
doc2 nur2, nur1 doc2

Should I use apply/map or is there a vectorised method?

CodePudding user response:

You can use:

docs = ['doc1', 'doc2', 'doc3']

df['Doctor'] = df.assign(Associates=df['Associates'].str.split(', ')) \
                 .melt(ignore_index=False).explode('value') \
                 .query('value.isin(@docs)').groupby(level=0)['value'].first()

Output:

1st responder Associates Doctor
doc1 nur1, nur2 doc1
nur1 doc1, nur2 doc1
nur3 nur1, nur2, doc2 doc2
doc2 nur2, nur1 doc2

CodePudding user response:

Keep the rows where 1st responder isin the list docs and fill the missing values with the str.extract on the second column. so

df['Doctor'] = (
    df['1st responder']
      .where(lambda x: x.isin(docs), 
             other = df['Associates'].str.extract(pat='(' '|'.join(docs) ')')[0])
)
print(df)
#   1st responder        Associates Doctor
# 0          doc1        nur1, nur2   doc1
# 1          nur1        doc1, nur2   doc1
# 2          nur2  nur3, nur4, doc2   doc2
# 3          doc2              nur4   doc2

input used

df = pd.DataFrame({
    '1st responder': ['doc1','nur1','nur2','doc2'],
    'Associates' : ['nur1, nur2', 'doc1, nur2', 'nur3, nur4, doc2', 'nur4']
})
docs = ['doc1','doc2', 'doc3']
  • Related