Home > OS >  How would you aggregate this data using pandas?
How would you aggregate this data using pandas?

Time:07-25

Imagine having this data:

d = pd.DataFrame({
    'country': ['France', 'France', 'Germany', 'Germany', 'France', 'Spain', 'Germany'],
    'user': [1, 2, 3, 4, 5, 6, 7],
    'age': [20, 32, 19, 28, 22, 36, 23],
    'language': ['Python', 'C', 'Python', 'Python', 'Python', 'Go', 'Go'],
})
    country user    age language
0   France  1       20  Python
1   France  2       32  C
2   Germany 3       19  Python
3   Germany 4       28  Python
4   France  5       22  Python
5   Spain   6       36  Go
6   Germany 7       23  Go

What I'm looking for is to aggregate it in this form:

         TotalUser    Python    C    Go
France       3           2      1     1
Germany      3           2      0     1
Spain        1           0      0     1

I'm able to get a total count of users for each country using:

d.groupby(['country']).size().reset_index(name='total')

Output:

    country total
0   France  3
1   Germany 3
2   Spain   1

I'm also able to get a total count of users grouped by country and language:

d[['country', 'language']].groupby(['country', 'language']).size()

Output:

country  language
France   C           1
         Python      2
Germany  Go          1
         Python      2
Spain    Go          1
dtype: int64

Is there anyway to aggregate these data in my desired form in any other way? (Without using two different aggregation/group by)? If not, How can I merge these two in my desired form?

CodePudding user response:

In your case just do pd.crosstab

out = pd.crosstab(d.country,
                  d.language,
                  margins = True,
                  margins_name = 'Total_user').drop(['Total_user'])
Out[474]: 
language  C  Go  Python  Total_user
country                            
France    1   0       2           3
Germany   0   1       2           3
Spain     0   1       0           1
  • Related