How can I accomplish this grouping process in Python/pandas?-CodePudding

I want to do a groupby on the pricing dates for each of the possible products and regions.

Old dataframe:

data_old = pd.DataFrame({'Product': ['Tom', 'Joseph', 'Tom', 'Joseph'], 'Region':['SP', 'RJ', 'SP', 'RJ'],'Price': [20, 21, 19, 18],'Date': ['04/mar','04/mar','05/fev','05/fev']})
data_old

    Product Region  Price   Date
0   Tom     SP      20      04/mar
1   Joseph  RJ      21      04/mar
2   Tom     SP      19      05/fev
3   Joseph  RJ      18      05/fev

Desired dataframe:

data_desired = pd.DataFrame({'Product': ['Tom', 'Joseph'], 'Region':['SP', 'RJ'],'04/mar': ['20','21'], '05/fev': [19, 18]})

data_desired

    Product Region  04/mar  05/fev
0   Tom     SP      20      19
1   Joseph  RJ      21      18

This is just an example dataframe. The one that needs to be worked on has more information, so I don't think a join solution would be welcome. The idea is to create columns for each date groups, and each of these date groups will present a price, considering all the other columns like Product, Region, Unit and so on.

CodePudding user response：

You can use:

data_new = data_old.pivot_table(index=['Product', 'Region'], columns=['Date'], values=['Price']).reset_index()

CodePudding user response：

You can accomplish this as follows using grouping and unstacking:

data_new = data_old.groupby(['Product', 'Region', 'Date']).Price.first().unstack('Date').reset_index()

Output:

     Product Region  04/mar  05/fev
0     Joseph     RJ      21      18
1        Tom     SP      20      19

Note that this approach will work even if there are multiple Regions per Product.