Pandas finding identifiers with a certain value-CodePudding

Given the following dataset:

ID	Purchased
100	0
100	1
100	0
100	1
101	1
101	1
101	1
101	1
102	0
102	0
102	0
102	0
103	1
103	1
103	1
103	1
104	0
104	1
104	0
104	1

In a pandas dataframe how could I locate and count the following the ID that have only the Purchased value of ‘1’ reoccurring (like ID: 101 and 103). And also how can I locate and count the identifiers that have changing binary values (Like ID: 100 and 104).

I have tried to use the .diff along with groupby ID but it didn’t work. df[‘diffPurchased’]= df.groupby(‘ID’)[‘Purchased’].diff()

Then I took the new feature diffPurchased to return the rows that have a changing ‘Purchased’ variable but it still didn’t accurately return all changing identifiers.

CodePudding user response：

The simplest if you want visual inspection might be to use an aggregation as set:

out = df.groupby('ID')['Purchased'].agg(set)

Output:

ID
100    {0, 1}
101       {1}
102       {0}
103       {1}
104    {0, 1}
Name: Purchased, dtype: object

Then you can do:

out[out == {1}].index.tolist()
# [101, 103]

out[out == {0, 1}].index.tolist()
# [100, 104]

Or even:

out = df.groupby('ID')['Purchased'].agg(frozenset)
dic = out.index.groupby(out)
print(dic)

Output:

{(1): [101, 103], (0): [102], (0, 1): [100, 104]}

ID	Purchased
100	0
100	1
100	0
100	1
101	1
101	1
101	1
101	1
102	0
102	0
102	0
102	0
103	1
103	1
103	1
103	1
104	0
104	1
104	0
104	1

ID	Purchased
100	0
100	1
100	0
100	1
101	1
101	1
101	1
101	1
102	0
102	0
102	0
102	0
103	1
103	1
103	1
103	1
104	0
104	1
104	0
104	1

ID	Purchased
100	0
100	1
100	0
100	1
101	1
101	1
101	1
101	1
102	0
102	0
102	0
102	0
103	1
103	1
103	1
103	1
104	0
104	1
104	0
104	1