I have a following problem. I have this dataframe:
df = pd.DataFrame({
"id_person": ["10", "10", "11", "11", "11", "10"],
"date": ["2021-08-30", "2021-08-30", "2021-08-30","2021-08-30","2021-08-31","2021-08-31",],
"some_value": [34, 1, 31, 9, 5, 12],
})
I would like to compute for each day how many different people were working and how many rows are there for each day. I know how to compute each value separately and then to combine this into one dataframe using pd.merge
. But is there more pythonic way?
Desired output is:
df_output = pd.DataFrame({
"date": ["2021-08-30", "2021-08-31",],
"persons": [2, 2],
"count": [4, 2],
})
CodePudding user response:
Use GroupBy.agg
with DataFrameGroupBy.nunique
and DataFrameGroupBy.size
:
df1 = (df.groupby('date', as_index=False)
.agg(persons=('id_person', 'nunique'), count=('id_person', 'size')))
print (df1)
date persons count
0 2021-08-30 2 4
1 2021-08-31 2 2
CodePudding user response:
Group the dataframe by date
, then calculate aggregate for id_person
column, you can pass keyword/value pair, where keyword is the column name for aggregate, and value is the aggregate function.
>>> df.groupby('date')['id_person'].agg(persons='nunique', count='size')
persons count
date
2021-08-30 2 4
2021-08-31 2 2