Given the following dataset:
ID | Purchased |
---|---|
100 | 0 |
100 | 1 |
100 | 0 |
100 | 1 |
101 | 1 |
101 | 1 |
101 | 1 |
101 | 1 |
102 | 0 |
102 | 0 |
102 | 0 |
102 | 0 |
103 | 1 |
103 | 1 |
103 | 1 |
103 | 1 |
104 | 0 |
104 | 1 |
104 | 0 |
104 | 1 |
In a pandas dataframe how could I locate and count the following the ID that have only the Purchased value of ‘1’ reoccurring (like ID: 101 and 103). And also how can I locate and count the identifiers that have changing binary values (Like ID: 100 and 104).
I have tried to use the .diff along with groupby ID but it didn’t work.
df[‘diffPurchased’]= df.groupby(‘ID’)[‘Purchased’].diff()
Then I took the new feature diffPurchased to return the rows that have a changing ‘Purchased’ variable but it still didn’t accurately return all changing identifiers.
CodePudding user response:
The simplest if you want visual inspection might be to use an aggregation as set:
out = df.groupby('ID')['Purchased'].agg(set)
Output:
ID
100 {0, 1}
101 {1}
102 {0}
103 {1}
104 {0, 1}
Name: Purchased, dtype: object
Then you can do:
out[out == {1}].index.tolist()
# [101, 103]
out[out == {0, 1}].index.tolist()
# [100, 104]
Or even:
out = df.groupby('ID')['Purchased'].agg(frozenset)
dic = out.index.groupby(out)
print(dic)
Output:
{(1): [101, 103], (0): [102], (0, 1): [100, 104]}