I have following pandas dataframe (HC_subset_umls)
term code source term_normlz CUI CODE SAB TTY STR
0 B-cell lymphoma meddra:10003899 meddra b-cell lymphoma C0079731 MTHU019696 OMIM PTCS b-cell lymphoma
1 B-cell lymphoma meddra:10003899 meddra b-cell lymphoma C0079731 10003899 MDR PT b-cell lymphoma
2 Astrocytoma meddra:10003571 meddra astrocytoma C0004114 10003571 MDR PT astrocytoma
3 Astrocytoma meddra:10003571 meddra astrocytoma C0004114 D001254 MSH MH astrocytoma
I would like to group rows based on common CUI and generate new columns.
The desired output is:
term code source term_normlz CUI OMIM_CODE OMIM_TTY OMIM_STR MDR_CODE MDR_TTY MDR_STR MSH_CODE MSH_TTY MSH_STR
0 B-cell lymphoma meddra:10003899 meddra b-cell lymphoma C0079731 MTHU019696 PTCS b-cell lymphoma 10003899 PT b-cell lymphoma NA NA NA NA
2 Astrocytoma meddra:10003571 meddra astrocytoma C0004114 NA NA NA 10003571 MDR PT astrocytoma D001254 MSH MH astrocytoma
I am using following lines of code.
HC_subset_umls['OMIM_CODE'] = (
HC_subset_umls['CUI']
.map(
HC_subset_umls
.groupby('CUI')
.apply(lambda x: x.loc[x['SAB'].isin(['OMIM']), 'CODE'].values[0])
)
)
HC_subset_umls['OMIM_TERM'] = (
HC_subset_umls['CUI']
.map(
HC_subset_umls
.groupby('CUI')
.apply(lambda x: x.loc[x['SAB'].isin(['OMIM']), 'STR'].values[0])
)
)
HC_subset_umls['OMIM_TTY'] = (
HC_subset_umls['CUI']
.map(
HC_subset_umls
.groupby('CUI')
.apply(lambda x: x.loc[x['SAB'].isin(['OMIM']), 'TTY'].values[0])
)
)
HC_subset_umls = HC_subset_umls[~(HC_subset_umls['SAB'].isin(['OMIM']))]
And subsequently for the other 'SAB' like 'MDR' and so on. However, I am getting following error.
IndexError: index 0 is out of bounds for axis 0 with size 0
Any help is highly appreciated.
CodePudding user response:
Try, using groupby
, ustack
, and flatten multiindex column headers.
df_out = (df.groupby(['term', 'code', 'source', 'term_normlz', 'CUI', 'SAB'])
.first()
.unstack()
.swaplevel(0,1, axis=1))
df_out.columns = df_out.columns.map('_'.join)
df_out.reset_index()
Output:
term code source term_normlz CUI MDR_CODE MSH_CODE OMIM_CODE MDR_TTY MSH_TTY OMIM_TTY MDR_STR MSH_STR OMIM_STR
0 Astrocytoma meddra:10003571 meddra astrocytoma C0004114 10003571 D001254 NaN PT MH NaN astrocytoma astrocytoma NaN
1 B-cell lymphoma meddra:10003899 meddra b-cell lymphoma C0079731 10003899 NaN MTHU019696 PT NaN PTCS b-cell lymphoma NaN b-cell lymphoma