Home > Software design >  Count number of specific value within groups in pandas
Count number of specific value within groups in pandas

Time:11-30

I have a dataframe such as

Groups Values 
A cow
A cow
A cow
A dog
B cat
B cat
B cat 
B fish
C cow
C fish
C cat 

And I would like to add a new column (count) where I count the number of cow tab['Values'] within each tab['Groups'].

I should then get:

Groups Values Count 
A cow 3
A cow 3
A cow 3
A dog 3
B cat 0
B cat 0
B cat 0
B fish 0
C cow 1
C fish 1
C cat 1

thanks a lot for your help

CodePudding user response:

Another way is to use value_counts and reindex after filtering your dataframe:

df['count'] = df[df['Values'] == 'cow'].value_counts('Groups') \
                                       .reindex(df['Groups'], fill_value=0).values
print(df)

# Output:
   Groups Values  count
0       A    cow      3
1       A    cow      3
2       A    cow      3
3       A    dog      3
4       B    cat      0
5       B    cat      0
6       B    cat      0
7       B   fish      0
8       C    cow      1
9       C   fish      1
10      C    cat      1

Performance

For 676 groups, 10k records

# with groupby, @jezrael
%timeit df['Values'].eq('cow').groupby(df['Groups']).transform('sum')
1.58 ms ± 25 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

# with value_counts, @Corralien
%timeit df[df['Values'] == 'cow'].value_counts('Groups').reindex(df['Groups'], fill_value=0).values
1.37 ms ± 3.58 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


# Setup, so ugly
import strings
groups = [a b for a,b in zip(np.random.choice(list(string.ascii_uppercase), 10000), np.random.choice(list(string.ascii_uppercase), 10000))]
animals = np.random.choice(pd.read_csv('https://gist.githubusercontent.com/atduskgreg/3cf8ef48cb0d29cf151bedad81553a54/raw/82f142562cf50b0f6fb8010f890b2f934093553e/animals.txt', error_bad_lines=False, header=None, squeeze=True), 10000)
df = pd.DataFrame({'Groups': groups, 'Values': animals})

CodePudding user response:

Use GroupBy.transform with mask for count Trues by sum:

df['count'] = df['Values'].eq('cow').groupby(df['Groups']).transform('sum')

print (df)
   Groups Values  count
0       A    cow      3
1       A    cow      3
2       A    cow      3
3       A    dog      3
4       B    cat      0
5       B    cat      0
6       B    cat      0
7       B   fish      0
8       C    cow      1
9       C   fish      1
10      C    cat      1
  • Related