Home > database >  Group by column in Pandas and count Unique values in each group
Group by column in Pandas and count Unique values in each group

Time:08-31

I'm trying to use groupby in pandas to group by a variable column and count the number of times a value shows up in the each group.

For example, using this group:

d = {'Period': [1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4], 
'Result': ['True','True','False','True','False','True','False','True','True','False','False','True','False','True','False','False']}
df = pd.DataFrame(data=d)
df.sort_values(by=['Period'], inplace=True)
print(df)

I'd like to count how many times 'True' or 'False' shows up in each period. Outputting something like this:

Period
1 : 2 True, 2 False
2 : 2 True, 1 False
3 : 0 True, 4 False
3 : 3 True, 1 False

The problem I'm having is that none of the methods in the examples I found do quite that.

.count() alone just counts the # of entries in each period

.nunique() returns the number of unique entries

.unique() returns the unique entries that exist but doesn't count them...

If you run this full example code you'll see what I mean:

#create dataframe
d = {'Period': [1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4], 
'Result': ['True','True','False','True','False','True','False','True','True','False','False','True','False','True','False','False']}
df = pd.DataFrame(data=d)
df.sort_values(by=['Period'], inplace=True)
print(df)

#group by and print counts 
print(df.groupby('Period')['Result'].count())
print(df.groupby('Period')['Result'].nunique())
print(df.groupby('Period')['Result'].unique())

CodePudding user response:

Use pd.crosstab:

print(pd.crosstab(df["Period"], df["Result"]))

Prints:

Result  False  True
Period             
1           2     2
2           1     3
3           4     0
4           1     3

CodePudding user response:

Using collections.Counter:

df.groupby('Period')['Result'].apply(Counter).fillna(0).unstack()

output:

        True  False
Period             
1        2.0    2.0
2        3.0    1.0
3        0.0    4.0
4        3.0    1.0

Using value_counts:

df.groupby('Period')['Result'].value_counts().unstack().fillna(0)

output:

Result  False  True
Period             
1         2.0   2.0
2         1.0   3.0
3         4.0   0.0
4         1.0   3.0
  • Related