I am trying to understand how to remove multiindex columns after applying groupby and agg - see below, last command returns (Target, count), (Target, sum) multiindex columns but I want a data frame without multiindex
# Sample Data Creation
lst_Purchase_Date = ['2022-02-28','2022-02-28','2022-02-28','2022-03-01','2022-03-01']
lst_Target = [0,0,1,1,1]
df_test = pd.DataFrame(list(zip(lst_Purchase_Date,lst_Target)),columns=['Purchase_Date','Target'])
df_test.head()
Purchase_Date Target
0 2022-02-28 0
1 2022-02-28 0
2 2022-02-28 1
3 2022-03-01 1
4 2022-03-01 1
Using groupby and agg results in column multi-index
result = df_test.groupby('Purchase_Date').agg(['sum','count'])
result = result.reset_index()
result.head()
Purchase_Date Target
sum count
0 2022-02-28 1 3
1 2022-03-01 2 2
I need result like below
Purchase_Date Target_Sum Target_Count
2022-02-28 1 3
2022-03-01 2 2
CodePudding user response:
You can use a more verbose syntax for agg where you get to specify the column names on the left, then the column to operate on, then the function to apply
import pandas as pd
df = pd.DataFrame(
{'Purchase_Date':
{0: '2022-02-28',
1: '2022-02-28',
2: '2022-02-28',
3: '2022-03-01',
4: '2022-03-01'},
'Target': {0: 0, 1: 0, 2: 1, 3: 1, 4: 1}}
)
result = df.groupby('Purchase_Date').agg(
Target_Sum = ('Target','sum'),
Target_Count = ('Target','count'),
)
result