I have a dataframe that looks like this:
key variable1 variable2 variable3
A x 5 s
A x 6 t
A x 6 t
B x 5 s
B x 6 t
B x 6 t
And I would like to create a new dataframe with this structure
key variable1 variable2 variable3 variable4 variable5 variable6 variable7 variable8 variable9
A x 5 s x 6 t x 6 t
B x 5 s x 6 t x 6 t
If there where only 2 duplicates for key I would know how to do it, but in this case I have 3 duplicates per key (drop duplicates first then last and merge)
I have looked in other questions, but I have not found a case like this.
Thank you for your time!
CodePudding user response:
Use DataFrame.set_index
with GroupBy.cumcount
for counter, reshape by DataFrame.unstack
and last set new columns names in list comprehension:
df1 = (df.set_index(['key',df.groupby('key').cumcount()])
.unstack()
.sort_index(axis=1, level=1))
df1.columns = [f'variable{x}' for x in range(1, len(df1.columns) 1)]
print (df1)
variable1 variable2 variable3 variable4 variable5 variable6 variable7 \
key
A x 5 s x 6 t x
B x 5 s x 6 t x
variable8 variable9
key
A 6 t
B 6 t
Last if necessary:
df1 = df1.reset_index()
Alternative solution with DataFrame.pivot
:
df1 = (df.assign(g = df.groupby('key').cumcount())
.pivot(index='key', columns='g')
.sort_index(axis=1, level=1))
df1.columns = [f'variable{x}' for x in range(1, len(df1.columns) 1)]