I want to filter based on whether a value is in another column. However this data needs to be grouped before the isin filter in applied. When I do this I get the error
'SeriesGroupBy' object has no attribute 'isin'
Example explaining what I'm trying to do:
import pandas as pd
dict = {'AttributeName': {0: 'John', 1: 'John', 2: 'John', 3: 'John', 4: 'Sally', 5: 'Sally'}, 'Lineage Step': {0: 1, 1: 2, 2: 3, 3: 4, 4:1, 5:2}, 'From Country': {0: 'Spain', 1: 'Scotland', 2: 'England', 3: 'England', 4: 'Scotland', 5:'England'}, 'From Town': {0: 'Madrid', 1: 'Edinburgh', 2: 'London', 3: 'London', 4: 'Edinburgh', 5: 'Manchester'}, 'FromStreet': {0: 'Spanish St', 1: 'Main St', 2: 'Lower St', 3: 'Middle St', 4: 'London St', 5: 'Scotland St'}, 'ToCountry': {0: 'Scotland', 1: 'England', 2: 'England', 3: 'England', 4: 'England', 5: 'England'}, 'ToTown': {0: 'Edinburgh', 1: 'London', 2: 'London', 3: 'London', 4: 'Liverpool', 5: 'London'}, 'ToStreet': {0: 'Lower St', 1: 'Middle St', 2: 'Upper St', 3: 'Upper St', 4: 'new St', 5: 'Old St'}}
sample_data = pd.DataFrame.from_dict(dict)
#example data set. I want to find every unique 'fromCountry' for both John and Sally. So For John we would just have the first row, where he enters from Spain to Scotland. The second row would be filtered as Scotland appears in the 'ToCountry' column. Sally would just have the 'FromCountry' Edinburgh row.
I have tried to do like this:
sample_grouped = sample_data.groupby('AttributeName')
sample_grouped[~sample_grouped['From Country'].isin(sample_grouped['ToCountry'])]
but I get there error 'SeriesGroupBy' object has no attribute 'isin'
Does anyone know how to use the isin (or comparable) function on grouped by data?
Thanks
CodePudding user response:
The error is self-explanatory, the isin
method you are trying to use is not there in Pandas Groupby object.
You can call apply
on pandas groupby object, then pass a lambda
function that returns only the rows that match the criteria.
out = (sample_data.groupby('AttributeName')
.apply(lambda x: x[~x['From Country'].isin(x['ToCountry'])])
)
OUTPUT:
AttributeName Lineage Step From Country From Town FromStreet ToCountry ToTown ToStreet
AttributeName
John 0 John 1 Spain Madrid Spanish St Scotland Edinburgh Lower St
Sally 4 Sally 1 Scotland Edinburgh London St England Liverpool new St
CodePudding user response:
use pandas query and find matching records then use unique() to group by
dict = {'AttributeName': {0: 'John', 1: 'John', 2: 'John', 3: 'John', 4: 'Sally', 5: 'Sally'}, 'Lineage Step': {0: 1, 1: 2, 2: 3, 3: 4, 4:1, 5:2}, 'From Country': {0: 'Spain', 1: 'Scotland', 2: 'England', 3: 'England', 4: 'Scotland', 5:'England'}, 'From Town': {0: 'Madrid', 1: 'Edinburgh', 2: 'London', 3: 'London', 4: 'Edinburgh', 5: 'Manchester'}, 'FromStreet': {0: 'Spanish St', 1: 'Main St', 2: 'Lower St', 3: 'Middle St', 4: 'London St', 5: 'Scotland St'}, 'ToCountry': {0: 'Scotland', 1: 'England', 2: 'England', 3: 'England', 4: 'England', 5: 'England'}, 'ToTown': {0: 'Edinburgh', 1: 'London', 2: 'London', 3: 'London', 4: 'Liverpool', 5: 'London'}, 'ToStreet': {0: 'Lower St', 1: 'Middle St', 2: 'Upper St', 3: 'Upper St', 4: 'new St', 5: 'Old St'}}
df = pd.DataFrame.from_dict(dict)
results=df.query('`From Country` not in ToCountry')
print(results['From Country'].unique())
output:
['Spain']