how to select a particular string from a column value and assign it to a new column in pandas-CodePudding

docs = [doc1, doc2, doc 3]

nurses = [nur1, nur2, nur3]

1st responder	Associates
doc1	nur1, nur2
nur1	doc1, nur2
nur3	nur1, nur2, doc2
doc2	nur2, nur1

I want to add a new column 'Doctor' and select the doctor from either '1st responder' or from 'Associates' such that the resultant df is like:

1st responder	Associates	Doctor
doc1	nur1, nur2	doc1
nur1	doc1, nur2	doc1
nur3	nur1, nur2, doc2	doc2
doc2	nur2, nur1	doc2

Should I use apply/map or is there a vectorised method?

CodePudding user response：

You can use:

docs = ['doc1', 'doc2', 'doc3']

df['Doctor'] = df.assign(Associates=df['Associates'].str.split(', ')) \
                 .melt(ignore_index=False).explode('value') \
                 .query('value.isin(@docs)').groupby(level=0)['value'].first()

Output:

1st responder	Associates	Doctor
doc1	nur1, nur2	doc1
nur1	doc1, nur2	doc1
nur3	nur1, nur2, doc2	doc2
doc2	nur2, nur1	doc2

CodePudding user response：

Keep the rows where 1st responder isin the list docs and fill the missing values with the str.extract on the second column. so

df['Doctor'] = (
    df['1st responder']
      .where(lambda x: x.isin(docs), 
             other = df['Associates'].str.extract(pat='(' '|'.join(docs) ')')[0])
)
print(df)
#   1st responder        Associates Doctor
# 0          doc1        nur1, nur2   doc1
# 1          nur1        doc1, nur2   doc1
# 2          nur2  nur3, nur4, doc2   doc2
# 3          doc2              nur4   doc2

input used

df = pd.DataFrame({
    '1st responder': ['doc1','nur1','nur2','doc2'],
    'Associates' : ['nur1, nur2', 'doc1, nur2', 'nur3, nur4, doc2', 'nur4']
})
docs = ['doc1','doc2', 'doc3']