I'm trying to get sales numbers of the last 5 years of a dataframe shown in additional columns, so I can see the sold items per year of the last 5 years.
Currently my code looks like this:
import pandas as pd
data = [
[1,'Apples','2017-02-23',10,0.4],
[2,'Oranges','2017-03-06',20,0.7],
[1,'Apples','2017-09-23',8,0.5],
[1,'Apples','2018-05-14',14,0.5],
[1,'Apples','2019-04-27',7,0.6],
[2,'Apples','2018-09-10',14,0.4],
[1,'Oranges','2018-07-12',9,0.7],
[1,'Oranges','2018-12-07',4,0.7]]
df = pd.DataFrame(data, columns = ['CustomerID','Product','Invoice Date','Amount','Price'])
df['Invoice Date'] = pd.to_datetime(df['Invoice Date']).dt.strftime('%Y')
grpyear = df.groupby(['CustomerID','Product','Invoice Date','Price'])
grpyear[['Amount']].sum()
How can I get the years to show in columns looking like this:
Customer ID | Product | Amount in 2017 | Amount in 2018 | etc.
CodePudding user response:
I think you did not mean to group by price. Please correct me if I'm wrong though.
In order to get a dataset like you asked:
# Removed `Price` from group
grpyear = df.groupby(['CustomerID','Product','Invoice Date'])
# Sum amounts by group
grpyear = grpyear[['Amount']].sum()
# Pivot result and fill NAs with 0
grpyear.reset_index().pivot(index=['CustomerID','Product'], columns=['Invoice Date'], values=['Amount']).fillna(0)