Home > database >  how to use pandas groupby to aggregate data across multiple columns
how to use pandas groupby to aggregate data across multiple columns

Time:11-18

I have a pandas dataframe:

Reference timestamp sub_reference datatype_indicator figure
REF1 2022-09-01 10 A 23.6
REF1 2022-09-01 48 B 25.8
REF1 2022-09-02 10 A 17.4
REF1 2022-10-01 10 A 23.6
REF1 2022-10-01 48 B 25.8
REF1 2022-10-02 10 A 17.4
REF2 2022-09-01 10 A 23.6
REF2 2022-09-01 48 B 25.8
REF2 2022-09-02 10 A 17.4
REF2 2022-10-01 11 A 23.6
REF2 2022-10-01 47 B 25.8
REF2 2022-10-02 10 A 17.4
REF3 2022-09-01 10 A 23.6
REF3 2022-09-01 48 B 25.8
REF3 2022-09-02 10 A 17.4
REF3 2022-10-01 11 A 23.6
REF3 2022-10-01 47 B 25.8
REF3 2022-10-02 10 A 17.4

I need to group the data by 'Reference' and the month in 'timestamp' to produce an aggregated value of 'figure' for the reference/month..

I am trying the below code, but receive TypeError: unhashable type: 'Series'

dg = df1.groupby([
            pd.Grouper('reference'),
            pd.Grouper(df1['timestamp'].dt.month)
            ]).sum()
dg.index = dg.index.strftime('%B')
print(dg)

CodePudding user response:

# create a year-month from teh date
# groupby and sum figure
df['month'] = pd.to_datetime(df['timestamp']).dt.strftime('%Y-%b')
out= df.groupby(['Reference','month' ], as_index=False)['figure'].sum()

out

OR

# use assign to create month column
# group and sum figure

out= (df.assign(month=pd.to_datetime(df['timestamp']).dt.strftime('%Y-%b'))
 .groupby(['Reference','month' ], as_index=False)['figure'].sum())

out
    Reference   month   figure
0   REF1    2022-Oct    66.8
1   REF1    2022-Sep    66.8
2   REF2    2022-Oct    66.8
3   REF2    2022-Sep    66.8
4   REF3    2022-Oct    66.8
5   REF3    2022-Sep    66.8

CodePudding user response:

I've never used the pd.Grouper before, but I think your issue is with how it is treating the extraction of the month.

I tried it like this:

>>> # add a new column for month
>>> df1["month"] = df1["timestamp"].dt.month

>>> dg = df1.groupby(by=["Reference", "month"], as_index=False).agg({"figure":sum})
>>> dg
  Reference  month  figure
0      REF1      9    66.8
1      REF1     10    66.8
2      REF2      9    66.8
3      REF2     10    66.8
4      REF3      9    66.8
5      REF3     10    66.8

CodePudding user response:

grouper = pd.PeriodIndex(df['timestamp'], freq='M')
df.groupby(['Reference', grouper])['figure'].sum().reset_index()

result:

    Reference   timestamp   figure
0   REF1        2022-09     66.8
1   REF1        2022-10     66.8
2   REF2        2022-09     66.8
3   REF2        2022-10     66.8
4   REF3        2022-09     66.8
5   REF3        2022-10     66.8

if you want change to %B

grouper = pd.to_datetime(df['timestamp']).dt.strftime('%B')
df.groupby(['Reference', grouper])['figure'].sum().reset_index()

result:

    Reference   timestamp   figure
0   REF1        October     66.8
1   REF1        September   66.8
2   REF2        October     66.8
3   REF2        September   66.8
4   REF3        October     66.8
5   REF3        September   66.8
  • Related