Home > Blockchain >  Selecting rows based on a '>' condition of the iteration of one of the columns
Selecting rows based on a '>' condition of the iteration of one of the columns

Time:10-19

I have not been able to come up with a better title, it's a really simple issue though, I just don't know what to call it exactly.

I have a database of horses simplified here:

horse_name stable_name
Horse1 Stable1

I am only interested in further analyzing records which feature stables that own many horses so I wanted to filter out the small stables (ones with less than 10 horses).

What I've tried:

Attempt 1:

Step 1: df['Stable'].value_counts() > 10 -> gives me boolean values, I inteded to use this to only querry the part of the database that satisfied this condition.

Step 2: df[df['Stable'].value_counts() > 10] -> I wrap this in another df, hoping I get the result that I want, but I don't, I get a key error.

Attempt 2:

Step 1: df['Stable'].value_counts().sort_values(ascending=False).head(21) -> a little clunky, but by trial and error, I figured out there are 21 stables with more than 10 horses, and this query returned just those stables. All I needed now is to filter the database out using this result.

Step 2: df[df['Stable'].value_counts().sort_values(ascending=False).head(21)] -> same issue, returns a key error.

I also tried: df[df['Stable'] in df['Stable'].value_counts() > 10] again, that didn't work, and I don't think I'll sleep today.

Can anyone explain why this is happening in a way that I can understand? And how should this be done instead?

CodePudding user response:

.value_counts() returns a series where it counts the unique values of the values in the column.

Try this:

df[df['Stable'] > 10]

CodePudding user response:

Try this out If you post the data, I would have been able to share the result

# create a temporary column 'b' by grouping on stable
# transform associates the result to all rows that are part of groupby
# we then check for 'b' and where its value is true

(df[df.assign(b=df.groupby(['Stable'],as_index=False)['Stable']
              .transform(lambda x: x.count() < 10) )['b']
    .eq(True)])
  • Related