How do I convert multiple rows to different columns?
I have a dataframe like this:
Brand Key Col_Name1 Percentage Col_Name2 Dollar_Value
A 1 Percentage_High 90 Dollar_Value_High 30000
A 1 Percentage_Low 70 Dollar_Value_Low 20000
B 2 Percentage_High 80 Dollar_Value_High 25000
B 2 Percentage_Low 60 Dollar_Value_Low 15000
C 3 Percentage_High Nan Dollar_Value_High Nan
C 3 Percentage_Low Nan Dollar_Value_Low Nan
I want to convert it to this way:
Brand Key Percentage_High Percentage_Low Dollar_Value_High Dollar_Value_Low
A 1 90 70 30000 20000
B 2 80 60 25000 15000
C 3 Nan Nan Nan Nan
I'm only able to do a single column currently:
df_pivot = df.pivot_table('Percentage', ['Brand', 'Key'], 'Col_Name1')
df_pivot.reset_index(drop=False, inplace=True)
But this only gives me one column and it also ignores Brand C where the values are Nan.
How do I do it for multiple columns and retain Nan values?
CodePudding user response:
I think you are looking for df.groupby. https://pandas.pydata.org/pandas-docs/stable/user_guide/groupby.html#named-aggregation
Example:
df.groupby(["Brand","Key"]).agg(
Percentage_High=("Percentage", "max"),
Percentage_Low=("Percentage", "min"),
Dollar_value_low=("Dollar_Value", "min"),
Dollar_value_high=("Dollar_Value", "max"),
)
Percentage_High Percentage_Low Dollar_value_low Dollar_value_high
Brand Key
A 1 90 70 20000 30000
B 2 80 60 15000 25000
C 3 Nan Nan Nan Nan
CodePudding user response:
try this:
def make_dict(g: pd.DataFrame):
result = dict(g.values[:, 2:].ravel().reshape(-1, 2))
return result
grouped = df.groupby(['Brand', 'Key'])
out = grouped.apply(make_dict).apply(pd.Series)
print(out)
>>>
Percentage_High Dollar_Value_High Percentage_Low Dollar_Value_Low
Brand Key
A 1 90 30000 70 20000
B 2 80 25000 60 15000
C 3 Nan Nan Nan Nan
CodePudding user response:
The groupby method works but only for this very particular case. I don't think it could work if you add for example in Col_Name_1
and Col_Name2
other values that could not be retrieved by aggregation on the Percentage
and Dollar_Value
column : see if i add in row at index 1 a mean value (which not necessarily corresponds to the mean of the min and max) :
Brand Key Col_Name1 Percentage Col_Name2 Dollar_Value
0 A 1 Percentage_High 90 Dollar_Value_High 30000
1 A 1 Percentage_Mean 85 Dollar_Value_Mean 28000
2 A 1 Percentage_Low 70 Dollar_Value_Low 20000
3 B 2 Percentage_High 80 Dollar_Value_High 25000
4 B 2 Percentage_Low 60 Dollar_Value_Low 15000
5 C 3 Percentage_High Nan Dollar_Value_High Nan
6 C 3 Percentage_Low Nan Dollar_Value_Low Nan
So i think a more general answer would be to do two pivots and then merge :
First pivot:
df1 = df.reset_index().pivot(index = ['Brand', 'Key'], columns = ['Col_Name1'], values = 'Percentage')
df1
output:
Col_Name1 Percentage_High Percentage_Low
Brand Key
A 1 90 70
B 2 80 60
C 3 NaN NaN
Second pivot:
df2 = df.reset_index().pivot(index = ['Brand', 'Key'], columns = ['Col_Name2'], values = 'Dollar_Value')
df2
output:
Col_Name2 Dollar_Value_High Dollar_Value_Low
Brand Key
A 1 30000 20000
B 2 25000 15000
C 3 NaN NaN
Join
df = df1.join(df2)
df
output:
Percentage_High Percentage_Low Dollar_Value_High Dollar_Value_Low
Brand Key
A 1 90 70 30000 20000
B 2 80 60 25000 15000
C 3 NaN NaN NaN NaN