How to combine groupby two times in pandas?-CodePudding

I have a following problem. I have this dataframe:

df = pd.DataFrame({
    "id_person": ["10", "10", "11", "11", "11", "10"],
    "date": ["2021-08-30", "2021-08-30", "2021-08-30","2021-08-30","2021-08-31","2021-08-31",],
    "some_value": [34, 1, 31, 9, 5, 12],
})

I would like to compute for each day how many different people were working and how many rows are there for each day. I know how to compute each value separately and then to combine this into one dataframe using pd.merge. But is there more pythonic way?

Desired output is:

df_output = pd.DataFrame({
    "date": ["2021-08-30", "2021-08-31",],
    "persons": [2, 2],
    "count": [4, 2],
})

CodePudding user response：

Use GroupBy.agg with DataFrameGroupBy.nunique and DataFrameGroupBy.size:

df1 = (df.groupby('date', as_index=False)
         .agg(persons=('id_person', 'nunique'), count=('id_person', 'size')))
print (df1)
         date  persons  count
0  2021-08-30        2      4
1  2021-08-31        2      2

CodePudding user response：

Group the dataframe by date, then calculate aggregate for id_person column, you can pass keyword/value pair, where keyword is the column name for aggregate, and value is the aggregate function.

>>> df.groupby('date')['id_person'].agg(persons='nunique', count='size')
            persons  count
date                      
2021-08-30        2      4
2021-08-31        2      2