Home > Software engineering >  Pandas finding identifiers with a certain value
Pandas finding identifiers with a certain value

Time:12-29

Given the following dataset:

ID Purchased
100 0
100 1
100 0
100 1
101 1
101 1
101 1
101 1
102 0
102 0
102 0
102 0
103 1
103 1
103 1
103 1
104 0
104 1
104 0
104 1

In a pandas dataframe how could I locate and count the following the ID that have only the Purchased value of ‘1’ reoccurring (like ID: 101 and 103). And also how can I locate and count the identifiers that have changing binary values (Like ID: 100 and 104).

I have tried to use the .diff along with groupby ID but it didn’t work. df[‘diffPurchased’]= df.groupby(‘ID’)[‘Purchased’].diff()

Then I took the new feature diffPurchased to return the rows that have a changing ‘Purchased’ variable but it still didn’t accurately return all changing identifiers.

CodePudding user response:

The simplest if you want visual inspection might be to use an aggregation as set:

out = df.groupby('ID')['Purchased'].agg(set)

Output:

ID
100    {0, 1}
101       {1}
102       {0}
103       {1}
104    {0, 1}
Name: Purchased, dtype: object

Then you can do:

out[out == {1}].index.tolist()
# [101, 103]

out[out == {0, 1}].index.tolist()
# [100, 104]

Or even:

out = df.groupby('ID')['Purchased'].agg(frozenset)
dic = out.index.groupby(out)
print(dic)

Output:

{(1): [101, 103], (0): [102], (0, 1): [100, 104]}
  • Related