Given the following pandas dataframe, I need to obtain (1) the count of the number of rows per date, and (2) the GDP and unemployment in those dates.
This is the dataframe:
df = pd.DataFrame({
'loan_id': [1,2,3,4,5,6,7],
'gdp': [1.1, 1.2, 1.3, 1.4, 1.1, 1.2, 1.3],
'unempl': [3.1, 3.2, 3.3, 3.4, 3.1, 3.2, 3.3],
'the_date': [datetime(2020,1,1),datetime(2020,1,2),datetime(2020,1,3),datetime(2020,1,4),
datetime(2020,1,1),datetime(2020,1,2),datetime(2020,1,3)]
})
loan_id gdp unempl the_date
0 1 1.1 3.1 2020-01-01
1 2 1.2 3.2 2020-01-02
2 3 1.3 3.3 2020-01-03
3 4 1.4 3.4 2020-01-04
4 5 1.1 3.1 2020-01-01
5 6 1.2 3.2 2020-01-02
6 7 1.3 3.3 2020-01-03
The resulting dataframe should be:
the_date gdp unempl count
2020-01-01 1.1 3.1 2
2020-01-02 1.2 3.2 2
2020-01-03 1.3 3.3 2
2020-01-04 1.4 3.4 1
I can obtain a count with a group by:
df2 = df.groupby('the_date')['the_date'].count()
the_date
2020-01-01 2
2020-01-02 2
2020-01-03 2
2020-01-04 1
Name: the_date, dtype: int64
but I don't know how to append the gdp and unemployment, how can that be achieved?
CodePudding user response:
df.groupby(['the_date','gdp','unempl' ])['loan_id'].agg(count='count').reset_index()
the_date gdp unempl count
0 2020-01-01 1.1 3.1 2
1 2020-01-02 1.2 3.2 2
2 2020-01-03 1.3 3.3 2
3 2020-01-04 1.4 3.4 1
CodePudding user response:
df.groupby(["gdp", "unempl", "the_date"]).the_date.agg("count"))
gdp unempl the_date
1.1 3.1 2020-01-01 2
1.2 3.2 2020-01-02 2
1.3 3.3 2020-01-03 2
1.4 3.4 2020-01-04 1
Name: the_date, dtype: int64