I'm trying to use groupby in pandas to group by a variable column and count the number of times a value shows up in the each group.
For example, using this group:
d = {'Period': [1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4],
'Result': ['True','True','False','True','False','True','False','True','True','False','False','True','False','True','False','False']}
df = pd.DataFrame(data=d)
df.sort_values(by=['Period'], inplace=True)
print(df)
I'd like to count how many times 'True' or 'False' shows up in each period. Outputting something like this:
Period
1 : 2 True, 2 False
2 : 2 True, 1 False
3 : 0 True, 4 False
3 : 3 True, 1 False
The problem I'm having is that none of the methods in the examples I found do quite that.
.count()
alone just counts the # of entries in each period
.nunique()
returns the number of unique entries
.unique()
returns the unique entries that exist but doesn't count them...
If you run this full example code you'll see what I mean:
#create dataframe
d = {'Period': [1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4],
'Result': ['True','True','False','True','False','True','False','True','True','False','False','True','False','True','False','False']}
df = pd.DataFrame(data=d)
df.sort_values(by=['Period'], inplace=True)
print(df)
#group by and print counts
print(df.groupby('Period')['Result'].count())
print(df.groupby('Period')['Result'].nunique())
print(df.groupby('Period')['Result'].unique())
CodePudding user response:
Use pd.crosstab
:
print(pd.crosstab(df["Period"], df["Result"]))
Prints:
Result False True
Period
1 2 2
2 1 3
3 4 0
4 1 3
CodePudding user response:
Using collections.Counter
:
df.groupby('Period')['Result'].apply(Counter).fillna(0).unstack()
output:
True False
Period
1 2.0 2.0
2 3.0 1.0
3 0.0 4.0
4 3.0 1.0
Using value_counts
:
df.groupby('Period')['Result'].value_counts().unstack().fillna(0)
output:
Result False True
Period
1 2.0 2.0
2 1.0 3.0
3 4.0 0.0
4 1.0 3.0