Home > Mobile >  How to identify column values with specific condition for values in other two column in Pandas?
How to identify column values with specific condition for values in other two column in Pandas?

Time:09-06

For example, identify IDs which leave 5 star on one app and 1 star on another and vice versa. Data frame has 1,050,000 rows.

           ID  Star  App_Name
0    41a-Z451     5         A
1    40591758     5         A
2    41128444     5         R
3    40537354     2         H
4    41a-Z451     1         B
5    40552356     1         I
6    413lll95     4         K
7    40htt506     5         A
8    41322993     3         C
9    414ju239     4         D

Output: (something like below)

           ID  Star  App_Name    User0n1
0    41a-Z451     5         A          1
1    40591758     5         A          0
2    41128444     5         R          0
3    40537354     2         H          0
4    41a-Z451     1         B          1
5    40552356     1         I          0
6    413lll95     4         K          0
7    40htt506     5         A          0
8    41322993     3         C          0
9    414ju239     4         D          0

CodePudding user response:

I modified a bit the input to have one more user with multiple votes.

NB. the app column shouldn't matter in the computation as a user can (usually) only vote once per app, thus the apps are necessarily unique per user.

If you strictly want to identify users that voted both a 5 and a 1, you can use np.ptp that will give you 4 in such case:

import numpy as np
df['Weird_User'] = df.groupby('ID')['Star'].transform(np.ptp).eq(4).astype(int)

If you specifically want to match all values from a list of conditions:

match = [1, 5]
df['Weird_User'] = (df.groupby('ID')['Star']
                      .transform(lambda x: all(v in x.values for v in match))
                      .astype(int)
                    )

output:

         ID  Star  Weird_User
0  41a-Z451     5           1
1  40591758     5           0
2  41128444     5           0
3  41128444     2           0
4  41a-Z451     1           1
5  40552356     1           0
6  41128444     4           0
7  40htt506     5           0
8  41322993     3           0
9  414ju239     4           0

CodePudding user response:

If need compare 1,5 values per ID and different App_Names first aggregate sets:

df1 = df.groupby('ID')[['Star','App_Name']].agg(set)
print (df1)
            Star App_Name
ID                       
40537354     {2}      {H}
40552356     {1}      {I}
40591758     {5}      {A}
40htt506     {5}      {A}
41128444     {5}      {R}
41322993     {3}      {C}
413lll95     {4}      {K}
414ju239     {4}      {D}
41a-Z451  {1, 5}   {A, B}

And then filter ID if multiple values in App_Name and if match 1,5 or 5,1:

ids = df1.index[df1['App_Name'].str.len().gt(1) & df1['Star'].eq(set({1,5}))]

df['Weird_User'] = df['ID'].isin(ids).astype(int)
print (df)
         ID  Star App_Name  Weird_User
0  41a-Z451     5        A           1
1  40591758     5        A           0
2  41128444     5        R           0
3  40537354     2        H           0
4  41a-Z451     1        B           1
5  40552356     1        I           0
6  413lll95     4        K           0
7  40htt506     5        A           0
8  41322993     3        C           0
9  414ju239     4        D           0
  • Related