Home > Blockchain >  how to split the csv columns into multiple rows in pandas
how to split the csv columns into multiple rows in pandas

Time:06-07

My data has below format

Version ID   Col1_ErrorCode  Col2_ErrorCode   Col3_ErrorCode  Col1_CID   Col2_CID    Col3_CID
1.0.0   555  1111            2222             3333            AAA       BBB          CCC  

I would like to have error code and CID into different rows like below.

Version  ID    Error    CID
1.0.0    555   1111     AAA
1.0.0    555   2222     BBB
1.0.0    555   3333     CCC

Can anyone please help me how to do this ?

CodePudding user response:

Let us try set_index then stack

s = df.set_index(['Version','ID'])
s.columns = pd.MultiIndex.from_tuples(s.columns.str.split('_').map(tuple))
out = s.stack(level=0).reset_index()
Out[315]: 
  Version   ID level_2  CID  ErrorCode
0   1.0.0  555    Col1  AAA       1111
1   1.0.0  555    Col2  BBB       2222
2   1.0.0  555    Col3  CCC       3333

CodePudding user response:

You can wide_to_long, but we need to fix your column names that way the stubs, 'CID' and 'ErrorCode', come before the underscore.

df.columns = ['_'.join(x[::-1]) for x in df.columns.str.split('_')]

df = (pd.wide_to_long(df, stubnames=['ErrorCode', 'CID'],
                      i=['Version', 'ID'], j='to_drop', suffix='.*', sep='_')
        .reset_index()
        .drop(columns='to_drop'))

print(df)
  Version   ID  ErrorCode  CID
0   1.0.0  555       1111  AAA
1   1.0.0  555       2222  BBB
2   1.0.0  555       3333  CCC
  • Related