I have a data set in excel like the following that will have a ~600 rows and varying number of rows that will be the same in Column A
Column A Column B
row1.rack1 row1.rack1.x,X,B,C
row1.rack1 row1.rack1.x,B,D,A
row1.rack2 row1.rack2.d,I,K,L
row1.rack2 row2.rack2.a,L,M,D
I want to able to be able to have only 1 of each string in Column A and for all the strings in Column B if they match Column A I want each one to be another column over. So in this case the output I'm looking for would be like...
Column A Column B Column C
row1.rack1 row1.rack1.x,X,B,C row1.rack1.x,B,D,A
row2.rack2 row1.rack2.d,I,K,L row2.rack2.a,L,M,D
I'm still pretty new to coding and wasn't sure where to start. Any help will be greatly appreciated!
CodePudding user response:
Use groupby
with pivot
:
(df.assign(idx=df.groupby('Column A').cumcount())
.pivot(index='Column A',columns='idx')['Column B']
.set_axis(['Column B', 'Column C'], axis=1)
.reset_index())
Column A Column B Column C
0 row1.rack1 row1.rack1.x,X,B,C row1.rack1.x,B,D,A
1 row1.rack2 row1.rack2.d,I,K,L row2.rack2.a,L,M,D
This code pivots the dataframe long to wide by the duplicates in Column A
.
If there might be arbitrary length of duplicate values, try:
import string
(df.assign(idx=df.groupby('Column A').cumcount())
.pivot(index='Column A',columns='idx')['Column B']
.rename(columns=lambda x: f'Column {string.ascii_uppercase[x 1]}')
.reset_index()
.rename_axis(columns=None))