I want to do a groupby on the pricing dates for each of the possible products and regions.
Old dataframe:
data_old = pd.DataFrame({'Product': ['Tom', 'Joseph', 'Tom', 'Joseph'], 'Region':['SP', 'RJ', 'SP', 'RJ'],'Price': [20, 21, 19, 18],'Date': ['04/mar','04/mar','05/fev','05/fev']})
data_old
Product Region Price Date
0 Tom SP 20 04/mar
1 Joseph RJ 21 04/mar
2 Tom SP 19 05/fev
3 Joseph RJ 18 05/fev
Desired dataframe:
data_desired = pd.DataFrame({'Product': ['Tom', 'Joseph'], 'Region':['SP', 'RJ'],'04/mar': ['20','21'], '05/fev': [19, 18]})
data_desired
Product Region 04/mar 05/fev
0 Tom SP 20 19
1 Joseph RJ 21 18
This is just an example dataframe. The one that needs to be worked on has more information, so I don't think a join solution would be welcome. The idea is to create columns for each date groups, and each of these date groups will present a price, considering all the other columns like Product, Region, Unit and so on.
CodePudding user response:
You can use:
data_new = data_old.pivot_table(index=['Product', 'Region'], columns=['Date'], values=['Price']).reset_index()
CodePudding user response:
You can accomplish this as follows using grouping and unstacking:
data_new = data_old.groupby(['Product', 'Region', 'Date']).Price.first().unstack('Date').reset_index()
Output:
Product Region 04/mar 05/fev
0 Joseph RJ 21 18
1 Tom SP 20 19
Note that this approach will work even if there are multiple Regions per Product.