docs = [doc1, doc2, doc 3]
nurses = [nur1, nur2, nur3]
1st responder | Associates |
---|---|
doc1 | nur1, nur2 |
nur1 | doc1, nur2 |
nur3 | nur1, nur2, doc2 |
doc2 | nur2, nur1 |
I want to add a new column 'Doctor' and select the doctor from either '1st responder' or from 'Associates' such that the resultant df is like:
1st responder | Associates | Doctor |
---|---|---|
doc1 | nur1, nur2 | doc1 |
nur1 | doc1, nur2 | doc1 |
nur3 | nur1, nur2, doc2 | doc2 |
doc2 | nur2, nur1 | doc2 |
Should I use apply/map or is there a vectorised method?
CodePudding user response:
You can use:
docs = ['doc1', 'doc2', 'doc3']
df['Doctor'] = df.assign(Associates=df['Associates'].str.split(', ')) \
.melt(ignore_index=False).explode('value') \
.query('value.isin(@docs)').groupby(level=0)['value'].first()
Output:
1st responder | Associates | Doctor |
---|---|---|
doc1 | nur1, nur2 | doc1 |
nur1 | doc1, nur2 | doc1 |
nur3 | nur1, nur2, doc2 | doc2 |
doc2 | nur2, nur1 | doc2 |
CodePudding user response:
Keep the rows where
1st responder isin
the list docs
and fill the missing values with the str.extract
on the second column. so
df['Doctor'] = (
df['1st responder']
.where(lambda x: x.isin(docs),
other = df['Associates'].str.extract(pat='(' '|'.join(docs) ')')[0])
)
print(df)
# 1st responder Associates Doctor
# 0 doc1 nur1, nur2 doc1
# 1 nur1 doc1, nur2 doc1
# 2 nur2 nur3, nur4, doc2 doc2
# 3 doc2 nur4 doc2
input used
df = pd.DataFrame({
'1st responder': ['doc1','nur1','nur2','doc2'],
'Associates' : ['nur1, nur2', 'doc1, nur2', 'nur3, nur4, doc2', 'nur4']
})
docs = ['doc1','doc2', 'doc3']