I have a pandas dataframe:
Reference | timestamp | sub_reference | datatype_indicator | figure |
---|---|---|---|---|
REF1 | 2022-09-01 | 10 | A | 23.6 |
REF1 | 2022-09-01 | 48 | B | 25.8 |
REF1 | 2022-09-02 | 10 | A | 17.4 |
REF1 | 2022-10-01 | 10 | A | 23.6 |
REF1 | 2022-10-01 | 48 | B | 25.8 |
REF1 | 2022-10-02 | 10 | A | 17.4 |
REF2 | 2022-09-01 | 10 | A | 23.6 |
REF2 | 2022-09-01 | 48 | B | 25.8 |
REF2 | 2022-09-02 | 10 | A | 17.4 |
REF2 | 2022-10-01 | 11 | A | 23.6 |
REF2 | 2022-10-01 | 47 | B | 25.8 |
REF2 | 2022-10-02 | 10 | A | 17.4 |
REF3 | 2022-09-01 | 10 | A | 23.6 |
REF3 | 2022-09-01 | 48 | B | 25.8 |
REF3 | 2022-09-02 | 10 | A | 17.4 |
REF3 | 2022-10-01 | 11 | A | 23.6 |
REF3 | 2022-10-01 | 47 | B | 25.8 |
REF3 | 2022-10-02 | 10 | A | 17.4 |
I need to group the data by 'Reference' and the month in 'timestamp' to produce an aggregated value of 'figure' for the reference/month..
I am trying the below code, but receive TypeError: unhashable type: 'Series'
dg = df1.groupby([
pd.Grouper('reference'),
pd.Grouper(df1['timestamp'].dt.month)
]).sum()
dg.index = dg.index.strftime('%B')
print(dg)
CodePudding user response:
# create a year-month from teh date
# groupby and sum figure
df['month'] = pd.to_datetime(df['timestamp']).dt.strftime('%Y-%b')
out= df.groupby(['Reference','month' ], as_index=False)['figure'].sum()
out
OR
# use assign to create month column
# group and sum figure
out= (df.assign(month=pd.to_datetime(df['timestamp']).dt.strftime('%Y-%b'))
.groupby(['Reference','month' ], as_index=False)['figure'].sum())
out
Reference month figure
0 REF1 2022-Oct 66.8
1 REF1 2022-Sep 66.8
2 REF2 2022-Oct 66.8
3 REF2 2022-Sep 66.8
4 REF3 2022-Oct 66.8
5 REF3 2022-Sep 66.8
CodePudding user response:
I've never used the pd.Grouper
before, but I think your issue is with how it is treating the extraction of the month.
I tried it like this:
>>> # add a new column for month
>>> df1["month"] = df1["timestamp"].dt.month
>>> dg = df1.groupby(by=["Reference", "month"], as_index=False).agg({"figure":sum})
>>> dg
Reference month figure
0 REF1 9 66.8
1 REF1 10 66.8
2 REF2 9 66.8
3 REF2 10 66.8
4 REF3 9 66.8
5 REF3 10 66.8
CodePudding user response:
grouper = pd.PeriodIndex(df['timestamp'], freq='M')
df.groupby(['Reference', grouper])['figure'].sum().reset_index()
result:
Reference timestamp figure
0 REF1 2022-09 66.8
1 REF1 2022-10 66.8
2 REF2 2022-09 66.8
3 REF2 2022-10 66.8
4 REF3 2022-09 66.8
5 REF3 2022-10 66.8
if you want change to %B
grouper = pd.to_datetime(df['timestamp']).dt.strftime('%B')
df.groupby(['Reference', grouper])['figure'].sum().reset_index()
result:
Reference timestamp figure
0 REF1 October 66.8
1 REF1 September 66.8
2 REF2 October 66.8
3 REF2 September 66.8
4 REF3 October 66.8
5 REF3 September 66.8