I have a dataframe with following columns.
CUI id term id term_name
C0000729 10000057 MDR LLT Abdominal cramps
C0000729 10000056 MDR LLT Abdominal cramp
C0000729 10011286 MDR LLT Cramp abdominal
C0000729 10000058 MDR LLT Abdominal crampy pains
C0000729 10093764 ICD10 PT Abdominal crampy pains
C0000800 10093765 ICD10 PT Abdominal pain
C0000800 10000058 MDR LLT Abdominal crampy pains
C0000800 10093764 ICD10AM PT Abdominal crampy pains
C0000730 10000052 MDR LLT Abdominal cramps back
I would like to fetch those rows (per CUI), if term is 'MDR' and and ICDs (ICD10, ICD10AM). But if it is only MDR, exclude it (eg. the last row 'C0000730').
The expected output is:
CUI id term id term_name
C0000729 10000057 MDR LLT Abdominal cramps
C0000729 10000056 MDR LLT Abdominal cramp
C0000729 10011286 MDR LLT Cramp abdominal
C0000729 10000058 MDR LLT Abdominal crampy pains
C0000729 10093764 ICD10 PT Abdominal crampy pains
C0000800 10093765 ICD10 PT Abdominal pain
C0000800 10000058 MDR LLT Abdominal crampy pains
C0000800 10093764 ICD10AM PT Abdominal crampy pains
I am using following lines of code using the above dataframe.
#select only those mappings where ICD and MeDRA both exists for a particular CUI id
s = set(['ICD10','ICD10CM','ICD10AM','MDR'])
dff_mapped = df_umls[df_umls.groupby('CUI')['SAB'].transform(lambda x: set(x) == s)]
dff_mapped = dff_mapped.sort_values(['CUI', 'SAB'],ascending = [True, True])
dff_mapped.to_csv('df_mapped', index = False, sep = ',')
Any help is highly appreciated.
CodePudding user response:
If I understood right:
nonMDR = df[df['term id'].str.startswith('ICD')] # creates a new df with ICDs
term_ids = nonMDR['CUI'].unique() # create an array of unique CUIs
df[df['CUI'].isin(term_ids)] # filter CUIs