Home > Software design >  Pandas GroupBy Total Row for Days of the Week, then sum only on one column
Pandas GroupBy Total Row for Days of the Week, then sum only on one column

Time:07-16

I attempted this by:

df = {
'inc_date':['06-Jul-2020','06-Jul-2020','06-Jul-2020','07-Jul-2020','08-Jul-2020','08-Jul-2020','09-Jul-2020',],
}
df = pd.DataFrame(dict(df))
df['inc_Day_of_Week'] = pd.DatetimeIndex(df['inc_date']).day_name() # Create ddd of the INC
print(df)

dfTemp = df.groupby(['inc_Day_of_Week']).size()
dfTemp = dfTemp.reset_index(name='inc_volume').sort_index()
# Add sorting so days of week appear Mon-Fri
cats = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
dfTemp['inc_Day_of_Week'] = pd.Categorical(dfTemp['inc_Day_of_Week'], categories=cats, ordered=True)
dfTemp.loc['Total'] = dfTemp.sum(numeric_only=True)
display(dfTemp)
print("", end='\n')

    inc_Day_of_Week inc_volume
0   Monday          3.0
1   Thursday        1.0
2   Tuesday         1.0
3   Wednesday       2.0
Total   NaN         7.0

This is whats required except for the NaN i.e. How to remove the NaN only sum on the one column?

Total           7.0

I tried to append but this does fail:

dfTotal1 = [[dfTemp.inc_volume.sum()]]
dfTotal2 = pd.DataFrame(dfTotal1,columns=['inc_volume'])
dfTemp.append(dfTotal2)

CodePudding user response:

Replacing NaN values with "" should solve your query. After you add the Total row to the dfTemp DataFrame, add this line of code

CODE

dfTemp.fillna(value="", inplace=True)

If you want to avoid calculating sum on categorical variables directly,

dfTotal = pd.DataFrame({"inc_Day_of_Week": "", "inc_volume": dfTemp.inc_volume.sum()}, index=["Total"])
dfTemp = pd.concat([dfTemp, dfTotal])

OUTPUT

          inc_Day_of_Week  inc_volume
0              Monday         3.0
1            Thursday         1.0
2             Tuesday         1.0
3           Wednesday         2.0
Total                         7.0

CodePudding user response:

You can also try this,

import pandas as pd

df = {
'inc_date':['06-Jul-2020','06-Jul-2020','06-Jul-2020','07-Jul-2020','08-Jul-2020','08-Jul-2020','09-Jul-2020',],
}
df = pd.DataFrame(dict(df))
df['inc_Day_of_Week'] = pd.DatetimeIndex(df['inc_date']).day_name()
df = pd.DataFrame(df.groupby('inc_Day_of_Week').count()).reset_index()
df.loc['Total'] = df.sum(numeric_only=True)
df.fillna(value='',inplace=True)
  • Related