I have data like this, it's output of a groupby:
numUsers = df.groupby(["user","isvalid"]).count()
count
user isvalid
5 0.0 1336
1.0 387
But I need to have count of count_valid and count_invalid columns for each user, like this:
count_valid count_invalid
user
5 387 1336
How can I do it in optimized way in Pandas?
CodePudding user response:
You can use:
out = (df.groupby(["user","isvalid"]).count()
.rename({0: 'count_invalid', 1: 'count_valid'}, level=1)
['count'].unstack()
)
Output:
isvalid count_invalid count_valid
user
5 1336 387
Or, more generic if you have multiple columns, using a MultiIndex:
out = (df.groupby(["user","isvalid"]).count()
.unstack().rename(columns={0: 'invalid', 1: 'valid'}, level=1)
)
out.columns = out.columns.map('_'.join)
Output:
count_invalid count_valid
user
5 1336 387
Or from the original dataset with a crosstab
:
pd.crosstab(df['user'], df['isvalid'].map({0: 'count_invalid', 1: 'count_valid'}))
CodePudding user response:
You can replace groupby_count
by value_counts
:
>>> (df.replace({'isvalid': {0: 'count_invalid', 1: 'count_valid'}})
.value_counts(['user', 'isvalid']).unstack('isvalid')
.rename_axis(columns=None))
count_invalid count_valid
user
5 1336 387
Another version with pivot_table
:
>>> (df.replace({'isvalid': {0: 'count_invalid', 1: 'count_valid'}}).assign(count=1)
.pivot_table(index='user', columns='isvalid', values='count', aggfunc='count')
.rename_axis(columns=None))
count_invalid count_valid
user
5 1336 387