I have a dataframe such as
Groups Values
A cow
A cow
A cow
A dog
B cat
B cat
B cat
B fish
C cow
C fish
C cat
And I would like to add a new column (count) where I count the number of cow tab['Values']
within each tab['Groups']
.
I should then get:
Groups Values Count
A cow 3
A cow 3
A cow 3
A dog 3
B cat 0
B cat 0
B cat 0
B fish 0
C cow 1
C fish 1
C cat 1
thanks a lot for your help
CodePudding user response:
Another way is to use value_counts
and reindex
after filtering your dataframe:
df['count'] = df[df['Values'] == 'cow'].value_counts('Groups') \
.reindex(df['Groups'], fill_value=0).values
print(df)
# Output:
Groups Values count
0 A cow 3
1 A cow 3
2 A cow 3
3 A dog 3
4 B cat 0
5 B cat 0
6 B cat 0
7 B fish 0
8 C cow 1
9 C fish 1
10 C cat 1
Performance
For 676 groups, 10k records
# with groupby, @jezrael
%timeit df['Values'].eq('cow').groupby(df['Groups']).transform('sum')
1.58 ms ± 25 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# with value_counts, @Corralien
%timeit df[df['Values'] == 'cow'].value_counts('Groups').reindex(df['Groups'], fill_value=0).values
1.37 ms ± 3.58 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# Setup, so ugly
import strings
groups = [a b for a,b in zip(np.random.choice(list(string.ascii_uppercase), 10000), np.random.choice(list(string.ascii_uppercase), 10000))]
animals = np.random.choice(pd.read_csv('https://gist.githubusercontent.com/atduskgreg/3cf8ef48cb0d29cf151bedad81553a54/raw/82f142562cf50b0f6fb8010f890b2f934093553e/animals.txt', error_bad_lines=False, header=None, squeeze=True), 10000)
df = pd.DataFrame({'Groups': groups, 'Values': animals})
CodePudding user response:
Use GroupBy.transform
with mask for count True
s by sum
:
df['count'] = df['Values'].eq('cow').groupby(df['Groups']).transform('sum')
print (df)
Groups Values count
0 A cow 3
1 A cow 3
2 A cow 3
3 A dog 3
4 B cat 0
5 B cat 0
6 B cat 0
7 B fish 0
8 C cow 1
9 C fish 1
10 C cat 1