Convert multiple rows to column headers and include nan values python df-CodePudding

How do I convert multiple rows to different columns?
I have a dataframe like this:

Brand    Key    Col_Name1          Percentage   Col_Name2           Dollar_Value
A        1      Percentage_High    90           Dollar_Value_High   30000
A        1      Percentage_Low     70           Dollar_Value_Low    20000
B        2      Percentage_High    80           Dollar_Value_High   25000
B        2      Percentage_Low     60           Dollar_Value_Low    15000
C        3      Percentage_High    Nan          Dollar_Value_High   Nan
C        3      Percentage_Low     Nan          Dollar_Value_Low    Nan

I want to convert it to this way:

Brand    Key    Percentage_High   Percentage_Low    Dollar_Value_High   Dollar_Value_Low
A        1      90                70                30000               20000
B        2      80                60                25000               15000
C        3      Nan               Nan               Nan                 Nan

I'm only able to do a single column currently:

df_pivot = df.pivot_table('Percentage', ['Brand', 'Key'], 'Col_Name1')
df_pivot.reset_index(drop=False, inplace=True)

But this only gives me one column and it also ignores Brand C where the values are Nan.
How do I do it for multiple columns and retain Nan values?

CodePudding user response：

I think you are looking for df.groupby. https://pandas.pydata.org/pandas-docs/stable/user_guide/groupby.html#named-aggregation

Example:

df.groupby(["Brand","Key"]).agg(
    Percentage_High=("Percentage", "max"),
    Percentage_Low=("Percentage", "min"),
    Dollar_value_low=("Dollar_Value", "min"),
    Dollar_value_high=("Dollar_Value", "max"),
)

          Percentage_High Percentage_Low Dollar_value_low Dollar_value_high
Brand Key                                                                  
A     1                90             70            20000             30000
B     2                80             60            15000             25000
C     3               Nan            Nan              Nan               Nan

CodePudding user response：

try this:

def make_dict(g: pd.DataFrame):
    result = dict(g.values[:, 2:].ravel().reshape(-1, 2))
    return result

grouped = df.groupby(['Brand', 'Key'])
out = grouped.apply(make_dict).apply(pd.Series)
print(out)
>>>

            Percentage_High  Dollar_Value_High  Percentage_Low  Dollar_Value_Low
Brand   Key             
A       1   90              30000               70              20000
B       2   80              25000               60              15000
C       3   Nan             Nan                 Nan             Nan

CodePudding user response：

The groupby method works but only for this very particular case. I don't think it could work if you add for example in Col_Name_1 and Col_Name2 other values that could not be retrieved by aggregation on the Percentage and Dollar_Value column : see if i add in row at index 1 a mean value (which not necessarily corresponds to the mean of the min and max) :

Brand   Key Col_Name1   Percentage  Col_Name2   Dollar_Value
0   A   1   Percentage_High 90  Dollar_Value_High   30000
1   A   1   Percentage_Mean 85  Dollar_Value_Mean   28000
2   A   1   Percentage_Low  70  Dollar_Value_Low    20000
3   B   2   Percentage_High 80  Dollar_Value_High   25000
4   B   2   Percentage_Low  60  Dollar_Value_Low    15000
5   C   3   Percentage_High Nan Dollar_Value_High   Nan
6   C   3   Percentage_Low  Nan Dollar_Value_Low    Nan

So i think a more general answer would be to do two pivots and then merge :

First pivot:

df1 = df.reset_index().pivot(index = ['Brand', 'Key'], columns = ['Col_Name1'], values = 'Percentage')
df1

output:

Col_Name1   Percentage_High Percentage_Low
Brand   Key     
A   1   90  70
B   2   80  60
C   3   NaN NaN

Second pivot:

df2 = df.reset_index().pivot(index = ['Brand', 'Key'], columns = ['Col_Name2'], values = 'Dollar_Value')
df2

output:

Col_Name2   Dollar_Value_High   Dollar_Value_Low
Brand   Key     
A   1   30000   20000
B   2   25000   15000
C   3   NaN NaN

Join

df = df1.join(df2)
df

output:

    Percentage_High Percentage_Low  Dollar_Value_High   Dollar_Value_Low
Brand   Key             
A   1   90  70  30000   20000
B   2   80  60  25000   15000
C   3   NaN NaN NaN NaN