Python Excel, Move corresponding strings to new column-CodePudding

I have a data set in excel like the following that will have a ~600 rows and varying number of rows that will be the same in Column A

Column A        Column B
row1.rack1      row1.rack1.x,X,B,C
row1.rack1      row1.rack1.x,B,D,A     
row1.rack2      row1.rack2.d,I,K,L
row1.rack2      row2.rack2.a,L,M,D

I want to able to be able to have only 1 of each string in Column A and for all the strings in Column B if they match Column A I want each one to be another column over. So in this case the output I'm looking for would be like...

Column A        Column B                Column C
row1.rack1      row1.rack1.x,X,B,C      row1.rack1.x,B,D,A
row2.rack2      row1.rack2.d,I,K,L      row2.rack2.a,L,M,D

I'm still pretty new to coding and wasn't sure where to start. Any help will be greatly appreciated!

CodePudding user response：

Use groupby with pivot:

(df.assign(idx=df.groupby('Column A').cumcount())
   .pivot(index='Column A',columns='idx')['Column B']
   .set_axis(['Column B', 'Column C'], axis=1)
   .reset_index())

     Column A            Column B            Column C
0  row1.rack1  row1.rack1.x,X,B,C  row1.rack1.x,B,D,A
1  row1.rack2  row1.rack2.d,I,K,L  row2.rack2.a,L,M,D

This code pivots the dataframe long to wide by the duplicates in Column A.

If there might be arbitrary length of duplicate values, try:

import string
(df.assign(idx=df.groupby('Column A').cumcount())
  .pivot(index='Column A',columns='idx')['Column B']
  .rename(columns=lambda x: f'Column {string.ascii_uppercase[x   1]}')
  .reset_index()
  .rename_axis(columns=None))