I am trying write a code that can read from a dataframe that has thousands of rows like:
name value
abc123 wrd
abc123 wrd
abc123 wrd
abc987 wrd
abc987 wrd
abc987 bbs
.. ..
I want the code to show me the name that has only the value wrd, for example in this case after the code works it should only show the result as abc123, since abc987 has both wrd and bss as value.
What I did so far is:
import pandas as pd
result = df[df['value'].isin(['wrd', 'bbs'])]
this results in both abc123 and abc987
result = df[df['value'].isin(['wrd'])]
this results in again both abc123 and abc987
What should I use instead to get name of the ones that only has the value as wrd?
CodePudding user response:
Fitler out group if not equal wrd
with test not equal scalar by Series.ne
and not matched name
s by Series.isin
:
result = df[~df['name'].isin(df.loc[df['value'].ne('wrd'), 'name'])]
print (result)
name value
0 abc123 wrd
1 abc123 wrd
2 abc123 wrd
Details:
print (df.loc[df['value'].ne('wrd'), 'name'])
5 abc987
Name: name, dtype: object()
print (~df['name'].isin(df.loc[df['value'].ne('wrd'), 'name']))
0 True
1 True
2 True
3 False
4 False
5 False
Name: name, dtype: bool()
Or use GroupBy.transform
with GroupBy.all
for test if match all values:
result = df[df['value'].eq('wrd').groupby(df['name']).transform('all')]
print (result)
name value
0 abc123 wrd
1 abc123 wrd
2 abc123 wrd
print (df['value'].eq('wrd').groupby(df['name']).transform('all'))
0 True
1 True
2 True
3 False
4 False
5 False
Name: name, dtype: bool
CodePudding user response:
try:
df
name value
0 abc123 wrd
1 abc123 wrd
2 abc123 wrd
3 abc987 wrd
4 abc987 wrd
5 abc987 bbs
df1 = df.groupby('name')['value'].apply(lambda x: list(set(x))).reset_index()
df1['value'] = df1['value'].map(lambda x: 1 if x == ['wrd'] else 0)
df1
name value
0 abc123 1
1 abc987 0
df1.loc[df1['value'].eq(1)]
name value
0 abc123 1