Home > other >  Output certain groupby-rows in a pandas dataframe as columns?
Output certain groupby-rows in a pandas dataframe as columns?

Time:10-17

I'm trying to get sales numbers of the last 5 years of a dataframe shown in additional columns, so I can see the sold items per year of the last 5 years.

Currently my code looks like this:

import pandas as pd

data = [
[1,'Apples','2017-02-23',10,0.4],
[2,'Oranges','2017-03-06',20,0.7],
[1,'Apples','2017-09-23',8,0.5],
[1,'Apples','2018-05-14',14,0.5],
[1,'Apples','2019-04-27',7,0.6],
[2,'Apples','2018-09-10',14,0.4],
[1,'Oranges','2018-07-12',9,0.7],
[1,'Oranges','2018-12-07',4,0.7]]

df = pd.DataFrame(data, columns = ['CustomerID','Product','Invoice Date','Amount','Price'])
df['Invoice Date'] = pd.to_datetime(df['Invoice Date']).dt.strftime('%Y')
grpyear = df.groupby(['CustomerID','Product','Invoice Date','Price'])
grpyear[['Amount']].sum()

How can I get the years to show in columns looking like this:

Customer ID | Product | Amount in 2017 | Amount in 2018 | etc.

CodePudding user response:

I think you did not mean to group by price. Please correct me if I'm wrong though.

In order to get a dataset like you asked:

# Removed `Price` from group
grpyear = df.groupby(['CustomerID','Product','Invoice Date'])
# Sum amounts by group
grpyear = grpyear[['Amount']].sum()
# Pivot result and fill NAs with 0
grpyear.reset_index().pivot(index=['CustomerID','Product'], columns=['Invoice Date'], values=['Amount']).fillna(0)
  • Related