For example, identify IDs which leave 5 star on one app and 1 star on another and vice versa. Data frame has 1,050,000 rows.
ID Star App_Name
0 41a-Z451 5 A
1 40591758 5 A
2 41128444 5 R
3 40537354 2 H
4 41a-Z451 1 B
5 40552356 1 I
6 413lll95 4 K
7 40htt506 5 A
8 41322993 3 C
9 414ju239 4 D
Output: (something like below)
ID Star App_Name User0n1
0 41a-Z451 5 A 1
1 40591758 5 A 0
2 41128444 5 R 0
3 40537354 2 H 0
4 41a-Z451 1 B 1
5 40552356 1 I 0
6 413lll95 4 K 0
7 40htt506 5 A 0
8 41322993 3 C 0
9 414ju239 4 D 0
CodePudding user response:
I modified a bit the input to have one more user with multiple votes.
NB. the app column shouldn't matter in the computation as a user can (usually) only vote once per app, thus the apps are necessarily unique per user.
If you strictly want to identify users that voted both a 5 and a 1, you can use np.ptp
that will give you 4
in such case:
import numpy as np
df['Weird_User'] = df.groupby('ID')['Star'].transform(np.ptp).eq(4).astype(int)
If you specifically want to match all values from a list of conditions:
match = [1, 5]
df['Weird_User'] = (df.groupby('ID')['Star']
.transform(lambda x: all(v in x.values for v in match))
.astype(int)
)
output:
ID Star Weird_User
0 41a-Z451 5 1
1 40591758 5 0
2 41128444 5 0
3 41128444 2 0
4 41a-Z451 1 1
5 40552356 1 0
6 41128444 4 0
7 40htt506 5 0
8 41322993 3 0
9 414ju239 4 0
CodePudding user response:
If need compare 1,5
values per ID
and different App_Name
s first aggregate set
s:
df1 = df.groupby('ID')[['Star','App_Name']].agg(set)
print (df1)
Star App_Name
ID
40537354 {2} {H}
40552356 {1} {I}
40591758 {5} {A}
40htt506 {5} {A}
41128444 {5} {R}
41322993 {3} {C}
413lll95 {4} {K}
414ju239 {4} {D}
41a-Z451 {1, 5} {A, B}
And then filter ID
if multiple values in App_Name
and if match 1,5
or 5,1
:
ids = df1.index[df1['App_Name'].str.len().gt(1) & df1['Star'].eq(set({1,5}))]
df['Weird_User'] = df['ID'].isin(ids).astype(int)
print (df)
ID Star App_Name Weird_User
0 41a-Z451 5 A 1
1 40591758 5 A 0
2 41128444 5 R 0
3 40537354 2 H 0
4 41a-Z451 1 B 1
5 40552356 1 I 0
6 413lll95 4 K 0
7 40htt506 5 A 0
8 41322993 3 C 0
9 414ju239 4 D 0