I have a df
:
id value
1 10
2 15
1 10
1 10
2 13
3 10
3 20
I am trying to keep only rows that have 1 unique value in column value
so that the result df
looks like this:
id value
1 10
1 10
1 10
I dropped id = 2, 3
because it has more than 1 unique value in column value
, 15, 13 & 10, 20
respectively.
I read this answer.
But this simply removes duplicates whereas I want to check if a given column - in this case column value
has more than 1 unique value.
I tried:
df['uniques'] = pd.Series(df.groupby('id')['value'].nunique())
But this returns nan
for every row since I am trying to fit n
returns on n m
rows after grouping. I can write a function and apply it to every row but I was wondering if there is a smart quick filter that achieves my goal.
CodePudding user response:
Use transform
with groupby to align the group values to the individual rows:
df['nuniques'] = df.groupby('id')['value'].transform('nunique')
Output:
id value nuniques
0 1 10 1
1 2 15 2
2 1 10 1
3 1 10 1
4 2 13 2
5 3 10 2
6 3 20 2
If you only need to filter your data, you don't need to assign the new column:
df[df.groupby('id')['value'].transform('nunique') == 1]
CodePudding user response:
Let us do filter
out = df.groupby('id').filter(lambda x : x['value'].nunique()==1)
Out[6]:
id value
0 1 10
2 1 10
3 1 10