Home > Enterprise >  Python Excel, Move corresponding strings to new column
Python Excel, Move corresponding strings to new column

Time:09-23

I have a data set in excel like the following that will have a ~600 rows and varying number of rows that will be the same in Column A

Column A        Column B
row1.rack1      row1.rack1.x,X,B,C
row1.rack1      row1.rack1.x,B,D,A     
row1.rack2      row1.rack2.d,I,K,L
row1.rack2      row2.rack2.a,L,M,D

I want to able to be able to have only 1 of each string in Column A and for all the strings in Column B if they match Column A I want each one to be another column over. So in this case the output I'm looking for would be like...

Column A        Column B                Column C
row1.rack1      row1.rack1.x,X,B,C      row1.rack1.x,B,D,A
row2.rack2      row1.rack2.d,I,K,L      row2.rack2.a,L,M,D

I'm still pretty new to coding and wasn't sure where to start. Any help will be greatly appreciated!

CodePudding user response:

Use groupby with pivot:

(df.assign(idx=df.groupby('Column A').cumcount())
   .pivot(index='Column A',columns='idx')['Column B']
   .set_axis(['Column B', 'Column C'], axis=1)
   .reset_index())

     Column A            Column B            Column C
0  row1.rack1  row1.rack1.x,X,B,C  row1.rack1.x,B,D,A
1  row1.rack2  row1.rack2.d,I,K,L  row2.rack2.a,L,M,D

This code pivots the dataframe long to wide by the duplicates in Column A.

If there might be arbitrary length of duplicate values, try:

import string
(df.assign(idx=df.groupby('Column A').cumcount())
  .pivot(index='Column A',columns='idx')['Column B']
  .rename(columns=lambda x: f'Column {string.ascii_uppercase[x   1]}')
  .reset_index()
  .rename_axis(columns=None))
  • Related