This is my dataframe:
df = pd.DataFrame({
'User': ["alex", "alex", "ravi", "dodo", "dodo", "dodo", "cokie","dodo","nemo","ravi"],
'Id': ['a', 'b', 'b', 'a', 'b', 'b', 'c','a','e','b']
})
This is how my dataframe looklike:
User Id
0 alex a
1 alex b
2 ravi b
3 dodo a
4 dodo b
5 dodo b
6 cokie c
7 dodo a
8 nemo e
9 ravi b
I first counted the no of items for each user using the following code:
df_group = df.groupby(['User', 'Id'])
# size of group to count observations
df_group = df_group.size()
# make a column name
df_group = df_group.reset_index(name='Observation')
This is how it looks:
User Id Observation
0 alex a 1
1 alex b 1
2 cokie c 1
3 dodo a 2
4 dodo b 2
5 nemo e 1
6 ravi b 2
I want to remove a user who is coming 1 time and also observation is 1. For example user nemo and cokie. But I don't want to remove user alex, even though items a and b are coming 1 time or user ravi.
How can I do it?
My end goal:
User Id Observation
0 alex a 1
1 alex b 1
3 dodo a 2
4 dodo b 2
6 ravi b 2
CodePudding user response:
To do it in an automated way I suggest to use a merge after using .groupby(['User']).filter(lambda x: len(x)
to filter those Users that show up only once.
df_group = df.groupby(['User', 'Id'])
# size of group to count observations
df_group = df_group.size()
# make a column name
df_group = df_group.reset_index(name='Observation').merge(df.groupby(['User']).filter(lambda x: len(x) > 1)).drop_duplicates()
Which outputs:
User Id Observation
0 alex a 1
1 alex b 1
2 dodo a 2
4 dodo b 2
6 ravi b 2
Alternatively without using merge
:
df_group = df.groupby(['User', 'Id'],as_index=False)
# size of group to count observations
df_group = df_group.size()
# make a column name
df_group = df_group.loc[df_group['User'].isin(df.groupby(['User']).filter(lambda x: len(x) > 1)['User'].unique())]
CodePudding user response:
This works:
s = df.groupby('User').size() > 1
df = df.merge(df_group, how='left')[df['User'].isin(s[s].index)].drop_duplicates()