I have the following dataframe
CUI CODE SAB TTY STR cui_name
C0000772 10028183 MDR LLT Multiple congenital anomalies Multiple congenital anomalies
C0000772 10002585 MDR LLT Anomalies congenital multiple Multiple congenital anomalies
C0000772 10002621 MDR LLT Anomaly congenital multiple (NOS) Multiple congenital anomalies
C0000772 10025533 MDR LLT Malformations multiple Multiple congenital anomalies
C0000772 10028182 MDR LLT Multiple congenital abnormalities Multiple congenital anomalies
C0000772 10028182 MDR PT Multiple congenital abnormalities Multiple congenital anomalies
C0000772 10028185 MDR LLT Multiple congenital malformations Multiple congenital anomalies
C0000772 D000015 MSH MH "Abnormalities, Multiple" Multiple congenital anomalies
C0001163 10000521 MDR LLT Acoustic nerve disorder NOS Vestibulocochlear Nerve Diseases
C0001163 10078794 MDR LLT Auditory nerve disorder Vestibulocochlear Nerve Diseases
C0001163 10078794 MDR PT Auditory nerve disorder Vestibulocochlear Nerve Diseases
C0001163 D000160 MSH MH Vestibulocochlear Nerve Diseases Vestibulocochlear Nerve Diseases
I would like to add an additional column ('MSH_ID') per CUI id using value of 'CODE' column when column 'SAB' value is 'MSH'
The expected output is:
CUI CODE SAB TTY STR cui_name MSH_ID
C0000772 10028183 MDR LLT Multiple congenital anomalies Multiple congenital anomalies D000015
C0000772 10002585 MDR LLT Anomalies congenital multiple Multiple congenital anomalies D000015
C0000772 10002621 MDR LLT Anomaly congenital multiple (NOS) Multiple congenital anomalies D000015
C0000772 10025533 MDR LLT Malformations multiple Multiple congenital anomalies D000015
C0000772 10028182 MDR LLT Multiple congenital abnormalities Multiple congenital anomalies D000015
C0000772 10028182 MDR PT Multiple congenital abnormalities Multiple congenital anomalies D000015
C0000772 10028185 MDR LLT Multiple congenital malformations Multiple congenital anomalies D000015
C0001163 10000521 MDR LLT Acoustic nerve disorder NOS Vestibulocochlear Nerve Diseases D000160
C0001163 10078794 MDR LLT Auditory nerve disorder Vestibulocochlear Nerve Diseases D000160
C0001163 10078794 MDR PT Auditory nerve disorder Vestibulocochlear Nerve Diseases D000160
I tried following lines but how to insert the values to other rows also (with same CUI id)
test['MSH_ID'] = test.loc[test['SAB'] == 'MSH', 'CODE']
Any help is highly appreciated!
CodePudding user response:
Try this:
df['MSH_ID'] = df.loc[df['SAB'] == 'MSH', 'CODE'].values[0]
df = df[~(df['SAB']=='MSH')]
print(df)
CUI CODE SAB TTY STR cui_name MSH_ID
0 C0000772 10028183 MDR LLT Multiple congenital anomalies Multiple congenital anomalies D000015
1 C0000772 10002585 MDR LLT Anomalies congenital multiple Multiple congenital anomalies D000015
2 C0000772 10002621 MDR LLT Anomaly congenital multiple (NOS) Multiple congenital anomalies D000015
3 C0000772 10025533 MDR LLT Malformations multiple Multiple congenital anomalies D000015
4 C0000772 10028182 MDR LLT Multiple congenital abnormalities Multiple congenital anomalies D000015
5 C0000772 10028182 MDR PT Multiple congenital abnormalities Multiple congenital anomalies D000015
6 C0000772 10028185 MDR LLT Multiple congenital malformations Multiple congenital anomalies D000015
UPDATE
for the new task with more than one CUI_Id
:
df['MSH_ID'] = (
df['CUI']
.map(
df
.groupby('CUI')
.apply(lambda x: x.loc[x['SAB'] == 'MSH', 'CODE'].values[0])
)
)
CUI CODE SAB TTY STR cui_name new
0 C0000772 10028183 MDR LLT Multiple congenital anomalies Multiple congenital anomalies D000015
1 C0000772 10002585 MDR LLT Anomalies congenital multiple Multiple congenital anomalies D000015
2 C0000772 10002621 MDR LLT Anomaly congenital multiple (NOS) Multiple congenital anomalies D000015
3 C0000772 10025533 MDR LLT Malformations multiple Multiple congenital anomalies D000015
4 C0000772 10028182 MDR LLT Multiple congenital abnormalities Multiple congenital anomalies D000015
5 C0000772 10028182 MDR PT Multiple congenital abnormalities Multiple congenital anomalies D000015
6 C0000772 10028185 MDR LLT Multiple congenital malformations Multiple congenital anomalies D000015
7 C0000772 D000015 MSH MH "Abnormalities, Multiple" Multiple congenital anomalies D000015
8 C0001163 10000521 MDR LLT Acoustic nerve disorder NOS Vestibulocochlear Nerve Diseases D000160
9 C0001163 10078794 MDR LLT Auditory nerve disorder Vestibulocochlear Nerve Diseases D000160
10 C0001163 10078794 MDR PT Auditory nerve disorder Vestibulocochlear Nerve Diseases D000160
11 C0001163 D000160 MSH MH Vestibulocochlear Nerve Diseases Vestibulocochlear Nerve Diseases D000160