My data has below format
Version ID Col1_ErrorCode Col2_ErrorCode Col3_ErrorCode Col1_CID Col2_CID Col3_CID
1.0.0 555 1111 2222 3333 AAA BBB CCC
I would like to have error code and CID into different rows like below.
Version ID Error CID
1.0.0 555 1111 AAA
1.0.0 555 2222 BBB
1.0.0 555 3333 CCC
Can anyone please help me how to do this ?
CodePudding user response:
Let us try set_index
then stack
s = df.set_index(['Version','ID'])
s.columns = pd.MultiIndex.from_tuples(s.columns.str.split('_').map(tuple))
out = s.stack(level=0).reset_index()
Out[315]:
Version ID level_2 CID ErrorCode
0 1.0.0 555 Col1 AAA 1111
1 1.0.0 555 Col2 BBB 2222
2 1.0.0 555 Col3 CCC 3333
CodePudding user response:
You can wide_to_long
, but we need to fix your column names that way the stubs, 'CID' and 'ErrorCode', come before the underscore.
df.columns = ['_'.join(x[::-1]) for x in df.columns.str.split('_')]
df = (pd.wide_to_long(df, stubnames=['ErrorCode', 'CID'],
i=['Version', 'ID'], j='to_drop', suffix='.*', sep='_')
.reset_index()
.drop(columns='to_drop'))
print(df)
Version ID ErrorCode CID
0 1.0.0 555 1111 AAA
1 1.0.0 555 2222 BBB
2 1.0.0 555 3333 CCC