I have not been able to come up with a better title, it's a really simple issue though, I just don't know what to call it exactly.
I have a database of horses simplified here:
horse_name | stable_name |
---|---|
Horse1 | Stable1 |
I am only interested in further analyzing records which feature stables that own many horses so I wanted to filter out the small stables (ones with less than 10 horses).
What I've tried:
Attempt 1:
Step 1: df['Stable'].value_counts() > 10
-> gives me boolean values, I inteded to use this to only querry the part of the database that satisfied this condition.
Step 2: df[df['Stable'].value_counts() > 10]
-> I wrap this in another df, hoping I get the result that I want, but I don't, I get a key error.
Attempt 2:
Step 1: df['Stable'].value_counts().sort_values(ascending=False).head(21)
-> a little clunky, but by trial and error, I figured out there are 21 stables with more than 10 horses, and this query returned just those stables. All I needed now is to filter the database out using this result.
Step 2: df[df['Stable'].value_counts().sort_values(ascending=False).head(21)]
-> same issue, returns a key error.
I also tried: df[df['Stable'] in df['Stable'].value_counts() > 10]
again, that didn't work, and I don't think I'll sleep today.
Can anyone explain why this is happening in a way that I can understand? And how should this be done instead?
CodePudding user response:
.value_counts() returns a series where it counts the unique values of the values in the column.
Try this:
df[df['Stable'] > 10]
CodePudding user response:
Try this out If you post the data, I would have been able to share the result
# create a temporary column 'b' by grouping on stable
# transform associates the result to all rows that are part of groupby
# we then check for 'b' and where its value is true
(df[df.assign(b=df.groupby(['Stable'],as_index=False)['Stable']
.transform(lambda x: x.count() < 10) )['b']
.eq(True)])