Home > Software engineering >  add columns with duplicate key with pandas
add columns with duplicate key with pandas

Time:11-14

I have a dataframe that looks like this:

key variable1 variable2 variable3
A      x         5         s
A      x         6         t 
A      x         6         t 
B      x         5         s
B      x         6         t 
B      x         6         t 

And I would like to create a new dataframe with this structure

key variable1 variable2 variable3 variable4 variable5 variable6 variable7 variable8 variable9
A      x         5         s         x          6        t          x         6         t
B      x         5         s         x          6        t          x         6         t

If there where only 2 duplicates for key I would know how to do it, but in this case I have 3 duplicates per key (drop duplicates first then last and merge)

I have looked in other questions, but I have not found a case like this.

Thank you for your time!

CodePudding user response:

Use DataFrame.set_index with GroupBy.cumcount for counter, reshape by DataFrame.unstack and last set new columns names in list comprehension:

df1 = (df.set_index(['key',df.groupby('key').cumcount()])
         .unstack()
         .sort_index(axis=1, level=1))

df1.columns = [f'variable{x}' for x in range(1, len(df1.columns)   1)]
print (df1)
    variable1  variable2 variable3 variable4  variable5 variable6 variable7  \
key                                                                           
A           x          5         s         x          6         t         x   
B           x          5         s         x          6         t         x   

     variable8 variable9  
key                       
A            6         t  
B            6         t  

Last if necessary:

df1 = df1.reset_index()

Alternative solution with DataFrame.pivot:

df1 = (df.assign(g = df.groupby('key').cumcount())
         .pivot(index='key', columns='g')
         .sort_index(axis=1, level=1))

df1.columns = [f'variable{x}' for x in range(1, len(df1.columns)   1)]
  • Related