Home > Enterprise >  Using the isin() function on grouped data
Using the isin() function on grouped data

Time:12-02

I want to filter based on whether a value is in another column. However this data needs to be grouped before the isin filter in applied. When I do this I get the error

'SeriesGroupBy' object has no attribute 'isin'

Example explaining what I'm trying to do:

 import pandas as pd

dict = {'AttributeName': {0: 'John', 1: 'John', 2: 'John', 3: 'John', 4: 'Sally', 5: 'Sally'}, 'Lineage Step': {0: 1, 1: 2, 2: 3, 3: 4, 4:1, 5:2}, 'From Country': {0: 'Spain', 1: 'Scotland', 2: 'England', 3: 'England', 4: 'Scotland', 5:'England'}, 'From Town': {0: 'Madrid', 1: 'Edinburgh', 2: 'London', 3: 'London', 4: 'Edinburgh', 5: 'Manchester'}, 'FromStreet': {0: 'Spanish St', 1: 'Main St', 2: 'Lower St', 3: 'Middle St', 4: 'London St', 5: 'Scotland St'}, 'ToCountry': {0: 'Scotland', 1: 'England', 2: 'England', 3: 'England', 4: 'England', 5: 'England'}, 'ToTown': {0: 'Edinburgh', 1: 'London', 2: 'London', 3: 'London', 4: 'Liverpool', 5: 'London'}, 'ToStreet': {0: 'Lower St', 1: 'Middle St', 2: 'Upper St', 3: 'Upper St', 4: 'new St', 5: 'Old St'}}
sample_data = pd.DataFrame.from_dict(dict)

#example data set. I want to find every unique 'fromCountry' for both John and Sally. So For John we would just have the first row, where he enters from Spain to Scotland. The second row would be filtered as Scotland appears in the 'ToCountry' column. Sally would just have the 'FromCountry' Edinburgh row. 

I have tried to do like this:

sample_grouped = sample_data.groupby('AttributeName')
sample_grouped[~sample_grouped['From Country'].isin(sample_grouped['ToCountry'])]

but I get there error 'SeriesGroupBy' object has no attribute 'isin'

Does anyone know how to use the isin (or comparable) function on grouped by data?

Thanks

CodePudding user response:

The error is self-explanatory, the isin method you are trying to use is not there in Pandas Groupby object.

You can call apply on pandas groupby object, then pass a lambda function that returns only the rows that match the criteria.

out = (sample_data.groupby('AttributeName')
        .apply(lambda x: x[~x['From Country'].isin(x['ToCountry'])])
    )

OUTPUT:

                AttributeName  Lineage Step From Country  From Town FromStreet ToCountry     ToTown  ToStreet
AttributeName                                                                                                
John          0          John             1        Spain     Madrid Spanish St  Scotland  Edinburgh  Lower St
Sally         4         Sally             1     Scotland  Edinburgh  London St   England  Liverpool    new St

CodePudding user response:

use pandas query and find matching records then use unique() to group by

dict = {'AttributeName': {0: 'John', 1: 'John', 2: 'John', 3: 'John', 4: 'Sally', 5: 'Sally'}, 'Lineage Step': {0: 1, 1: 2, 2: 3, 3: 4, 4:1, 5:2}, 'From Country': {0: 'Spain', 1: 'Scotland', 2: 'England', 3: 'England', 4: 'Scotland', 5:'England'}, 'From Town': {0: 'Madrid', 1: 'Edinburgh', 2: 'London', 3: 'London', 4: 'Edinburgh', 5: 'Manchester'}, 'FromStreet': {0: 'Spanish St', 1: 'Main St', 2: 'Lower St', 3: 'Middle St', 4: 'London St', 5: 'Scotland St'}, 'ToCountry': {0: 'Scotland', 1: 'England', 2: 'England', 3: 'England', 4: 'England', 5: 'England'}, 'ToTown': {0: 'Edinburgh', 1: 'London', 2: 'London', 3: 'London', 4: 'Liverpool', 5: 'London'}, 'ToStreet': {0: 'Lower St', 1: 'Middle St', 2: 'Upper St', 3: 'Upper St', 4: 'new St', 5: 'Old St'}}
df = pd.DataFrame.from_dict(dict)

results=df.query('`From Country` not in ToCountry')
print(results['From Country'].unique())

output:

['Spain']
  • Related