Home > Mobile >  Filtering rows that have unique value in a column using pandas
Filtering rows that have unique value in a column using pandas

Time:12-21

I have a df:

id      value
1       10
2       15
1       10
1       10
2       13
3       10
3       20

I am trying to keep only rows that have 1 unique value in column value so that the result df looks like this:

id      value
1       10
1       10
1       10

I dropped id = 2, 3 because it has more than 1 unique value in column value, 15, 13 & 10, 20 respectively.

I read this answer. But this simply removes duplicates whereas I want to check if a given column - in this case column value has more than 1 unique value.

I tried:

df['uniques'] = pd.Series(df.groupby('id')['value'].nunique())

But this returns nan for every row since I am trying to fit n returns on n m rows after grouping. I can write a function and apply it to every row but I was wondering if there is a smart quick filter that achieves my goal.

CodePudding user response:

Use transform with groupby to align the group values to the individual rows:

df['nuniques'] = df.groupby('id')['value'].transform('nunique')

Output:

   id  value  nuniques
0   1     10         1
1   2     15         2
2   1     10         1
3   1     10         1
4   2     13         2
5   3     10         2
6   3     20         2

If you only need to filter your data, you don't need to assign the new column:

df[df.groupby('id')['value'].transform('nunique') == 1]

CodePudding user response:

Let us do filter

out = df.groupby('id').filter(lambda x : x['value'].nunique()==1)
Out[6]: 
   id  value
0   1     10
2   1     10
3   1     10
  • Related