Home > Software engineering >  Conditional creation of a new column using another column value in Python
Conditional creation of a new column using another column value in Python

Time:11-03

I have the following dataframe

CUI      CODE     SAB TTY STR                                cui_name
C0000772 10028183 MDR LLT Multiple congenital anomalies      Multiple congenital anomalies
C0000772 10002585 MDR LLT Anomalies congenital multiple      Multiple congenital anomalies
C0000772 10002621 MDR LLT Anomaly congenital multiple (NOS)  Multiple congenital anomalies
C0000772 10025533 MDR LLT Malformations multiple             Multiple congenital anomalies
C0000772 10028182 MDR LLT Multiple congenital abnormalities  Multiple congenital anomalies
C0000772 10028182 MDR PT Multiple congenital abnormalities   Multiple congenital anomalies
C0000772 10028185 MDR LLT Multiple congenital malformations  Multiple congenital anomalies
C0000772 D000015 MSH MH "Abnormalities, Multiple"            Multiple congenital anomalies
C0001163 10000521 MDR LLT Acoustic nerve disorder NOS       Vestibulocochlear Nerve Diseases
C0001163 10078794 MDR LLT Auditory nerve disorder           Vestibulocochlear Nerve Diseases
C0001163 10078794 MDR PT Auditory nerve disorder            Vestibulocochlear Nerve Diseases
C0001163 D000160 MSH MH Vestibulocochlear Nerve Diseases    Vestibulocochlear Nerve Diseases

I would like to add an additional column ('MSH_ID') per CUI id using value of 'CODE' column when column 'SAB' value is 'MSH'

The expected output is:

CUI        CODE     SAB TTY STR                 cui_name                      MSH_ID
C0000772  10028183 MDR LLT Multiple congenital anomalies     Multiple congenital anomalies D000015
C0000772  10002585 MDR LLT Anomalies congenital multiple     Multiple congenital anomalies D000015
C0000772  10002621 MDR LLT Anomaly congenital multiple (NOS) Multiple congenital anomalies D000015
C0000772  10025533 MDR LLT Malformations multiple            Multiple congenital anomalies D000015
C0000772  10028182 MDR LLT Multiple congenital abnormalities Multiple congenital anomalies D000015
C0000772  10028182 MDR PT  Multiple congenital abnormalities Multiple congenital anomalies D000015
C0000772  10028185 MDR LLT Multiple congenital malformations Multiple congenital anomalies D000015
C0001163 10000521  MDR LLT Acoustic nerve disorder NOS     Vestibulocochlear Nerve Diseases D000160
C0001163 10078794  MDR LLT Auditory nerve disorder         Vestibulocochlear Nerve Diseases D000160
C0001163 10078794  MDR PT  Auditory nerve disorder         Vestibulocochlear Nerve Diseases D000160

I tried following lines but how to insert the values to other rows also (with same CUI id)

test['MSH_ID'] = test.loc[test['SAB'] == 'MSH', 'CODE']

Any help is highly appreciated!

CodePudding user response:

Try this:

df['MSH_ID'] = df.loc[df['SAB'] == 'MSH', 'CODE'].values[0]
df = df[~(df['SAB']=='MSH')]
print(df)
        CUI      CODE  SAB  TTY                                STR                       cui_name   MSH_ID
0  C0000772  10028183  MDR  LLT      Multiple congenital anomalies  Multiple congenital anomalies  D000015
1  C0000772  10002585  MDR  LLT      Anomalies congenital multiple  Multiple congenital anomalies  D000015
2  C0000772  10002621  MDR  LLT  Anomaly congenital multiple (NOS)  Multiple congenital anomalies  D000015
3  C0000772  10025533  MDR  LLT             Malformations multiple  Multiple congenital anomalies  D000015
4  C0000772  10028182  MDR  LLT  Multiple congenital abnormalities  Multiple congenital anomalies  D000015
5  C0000772  10028182  MDR   PT  Multiple congenital abnormalities  Multiple congenital anomalies  D000015
6  C0000772  10028185  MDR  LLT  Multiple congenital malformations  Multiple congenital anomalies  D000015

UPDATE
for the new task with more than one CUI_Id:

df['MSH_ID'] = (
    df['CUI']
    .map(
        df
        .groupby('CUI')
        .apply(lambda x: x.loc[x['SAB'] == 'MSH', 'CODE'].values[0])
    )
)
         CUI      CODE  SAB  TTY                                STR                          cui_name      new
0   C0000772  10028183  MDR  LLT      Multiple congenital anomalies     Multiple congenital anomalies  D000015
1   C0000772  10002585  MDR  LLT      Anomalies congenital multiple     Multiple congenital anomalies  D000015
2   C0000772  10002621  MDR  LLT  Anomaly congenital multiple (NOS)     Multiple congenital anomalies  D000015
3   C0000772  10025533  MDR  LLT             Malformations multiple     Multiple congenital anomalies  D000015
4   C0000772  10028182  MDR  LLT  Multiple congenital abnormalities     Multiple congenital anomalies  D000015
5   C0000772  10028182  MDR   PT  Multiple congenital abnormalities     Multiple congenital anomalies  D000015
6   C0000772  10028185  MDR  LLT  Multiple congenital malformations     Multiple congenital anomalies  D000015
7   C0000772   D000015  MSH   MH          "Abnormalities, Multiple"     Multiple congenital anomalies  D000015
8   C0001163  10000521  MDR  LLT        Acoustic nerve disorder NOS  Vestibulocochlear Nerve Diseases  D000160
9   C0001163  10078794  MDR  LLT            Auditory nerve disorder  Vestibulocochlear Nerve Diseases  D000160
10  C0001163  10078794  MDR   PT            Auditory nerve disorder  Vestibulocochlear Nerve Diseases  D000160
11  C0001163   D000160  MSH   MH   Vestibulocochlear Nerve Diseases  Vestibulocochlear Nerve Diseases  D000160

  • Related