Imagine having this data:
d = pd.DataFrame({
'country': ['France', 'France', 'Germany', 'Germany', 'France', 'Spain', 'Germany'],
'user': [1, 2, 3, 4, 5, 6, 7],
'age': [20, 32, 19, 28, 22, 36, 23],
'language': ['Python', 'C', 'Python', 'Python', 'Python', 'Go', 'Go'],
})
country user age language
0 France 1 20 Python
1 France 2 32 C
2 Germany 3 19 Python
3 Germany 4 28 Python
4 France 5 22 Python
5 Spain 6 36 Go
6 Germany 7 23 Go
What I'm looking for is to aggregate it in this form:
TotalUser Python C Go
France 3 2 1 1
Germany 3 2 0 1
Spain 1 0 0 1
I'm able to get a total count of users for each country using:
d.groupby(['country']).size().reset_index(name='total')
Output:
country total
0 France 3
1 Germany 3
2 Spain 1
I'm also able to get a total count of users grouped by country
and language
:
d[['country', 'language']].groupby(['country', 'language']).size()
Output:
country language
France C 1
Python 2
Germany Go 1
Python 2
Spain Go 1
dtype: int64
Is there anyway to aggregate these data in my desired form in any other way? (Without using two different aggregation/group by)? If not, How can I merge these two in my desired form?
CodePudding user response:
In your case just do pd.crosstab
out = pd.crosstab(d.country,
d.language,
margins = True,
margins_name = 'Total_user').drop(['Total_user'])
Out[474]:
language C Go Python Total_user
country
France 1 0 2 3
Germany 0 1 2 3
Spain 0 1 0 1