Home > Blockchain >  Concatenate group by with distinct values in pandas
Concatenate group by with distinct values in pandas

Time:09-19

Given the following pandas dataframe, I need to obtain (1) the count of the number of rows per date, and (2) the GDP and unemployment in those dates.

This is the dataframe:

df = pd.DataFrame({
    'loan_id': [1,2,3,4,5,6,7],
    'gdp': [1.1, 1.2, 1.3, 1.4, 1.1, 1.2, 1.3],
    'unempl': [3.1, 3.2, 3.3, 3.4, 3.1, 3.2, 3.3],
    'the_date': [datetime(2020,1,1),datetime(2020,1,2),datetime(2020,1,3),datetime(2020,1,4),
                datetime(2020,1,1),datetime(2020,1,2),datetime(2020,1,3)]
})

loan_id gdp unempl  the_date
0   1   1.1 3.1     2020-01-01
1   2   1.2 3.2     2020-01-02
2   3   1.3 3.3     2020-01-03
3   4   1.4 3.4     2020-01-04
4   5   1.1 3.1     2020-01-01
5   6   1.2 3.2     2020-01-02
6   7   1.3 3.3     2020-01-03

The resulting dataframe should be:

the_date    gdp   unempl   count
2020-01-01  1.1   3.1      2
2020-01-02  1.2   3.2      2
2020-01-03  1.3   3.3      2  
2020-01-04  1.4   3.4      1

I can obtain a count with a group by:

df2 = df.groupby('the_date')['the_date'].count()

the_date
2020-01-01    2
2020-01-02    2
2020-01-03    2
2020-01-04    1
Name: the_date, dtype: int64

but I don't know how to append the gdp and unemployment, how can that be achieved?

CodePudding user response:

df.groupby(['the_date','gdp','unempl' ])['loan_id'].agg(count='count').reset_index()
the_date    gdp     unempl  count
0   2020-01-01  1.1     3.1     2
1   2020-01-02  1.2     3.2     2
2   2020-01-03  1.3     3.3     2
3   2020-01-04  1.4     3.4     1

CodePudding user response:

df.groupby(["gdp", "unempl", "the_date"]).the_date.agg("count"))

gdp  unempl  the_date  
1.1  3.1     2020-01-01    2
1.2  3.2     2020-01-02    2
1.3  3.3     2020-01-03    2
1.4  3.4     2020-01-04    1
Name: the_date, dtype: int64
  • Related