I would like to build a loop where the function will loop over the "counter" field creating subset dataframes and then join them back together
The below is a small example of what i want, in reality the counters can be in the hundreds:
data = {'name': ['Adam','Adam','Adam','Adam', 'nick','nick','nick','nick','nick','krish','krish','krish',],
'age': [20,20,20,20, 21,21,21,21,21, 19,19,19],
'Product': ['A','B','C','D','A','B','C','D','E','A','B','C'],
'Counter': [1,2,3,4,1,2,3,4,5,1,2,3]}
# Create DataFrame
df = pd.DataFrame(data)
df1 = df.loc[df['Counter'] == 1]
df1 = df1[['name','age','Product']]
df2 = df.loc[df['Counter'] == 2]
df2 = df2[['name','age','Product']]
df3 = df.loc[df['Counter'] == 3]
df3 = df3[['name','age','Product']]
df4 = df.loc[df['Counter'] == 4]
df4 = df4[['name','age','Product']]
df5 = df.loc[df['Counter'] == 5]
df5 = df5[['name','age','Product']]
dfs = [df1,df2,df3,df4,df5]
from functools import reduce
df_merged = reduce(lambda left,right: pd.merge(left,right,on=['name'], how='outer'), dfs)
CodePudding user response:
I believe what you looking for is called df.pivot
:
x = df.pivot(index="name", columns="Counter")
x = x[sorted(x.columns, key=lambda x: (x[1], x[0]))]
x.columns = [f"{a}_{b}" for a, b in x.columns.values]
print(x)
Prints:
Product_1 age_1 Product_2 age_2 Product_3 age_3 Product_4 age_4 Product_5 age_5
name
Adam A 20.0 B 20.0 C 20.0 D 20.0 NaN NaN
krish A 19.0 B 19.0 C 19.0 NaN NaN NaN NaN
nick A 21.0 B 21.0 C 21.0 D 21.0 E 21.0