Home > Net >  Add column based on numpy select in Pandas
Add column based on numpy select in Pandas

Time:04-28

Trying to add a column in pandas dataframe based on the following numpy select statement

I can get the value as a dataframe shown below

f=pd.DataFrame(np.select(
    [   
        df.groupby('usernumber')['date'].nunique().between(0, 3, inclusive=True), 
        df.groupby('usernumber')['date'].nunique().between(3,5, inclusive=True), 
        df.groupby('usernumber')['date'].nunique()>5
     
    ], 
    [
        
        'Few', 
        'Moderate',
        'Many'
    ], 
    default='Unknown'
),columns = ['UsageType'])

Ideally I would like this to be added as a column with the classified values in the main df

df

usernumber  date      UsageType
12314       20220201  Few
12314       20220202  Few
12314       20220203  Few
32423       20220201  Moderate
32423       20220202  Moderate
32423       20220203  Moderate
32423       20220204  Moderate
43535       20220201  Many
43535       20220202  Many
43535       20220203  Many
43535       20220204  Many
43535       20220205  Many

Sample df data

usernumber  date    Role    Task
12314   20220201    IT          logon
12314   20220202    IT          logon
12314   20220203    IT          logon
32423   20220201    DB          logon
32423   20220202    DB          logoff
32423   20220203    DB          logon
32423   20220204    DB          logon
43535   20220201    Admin       logon
43535   20220202    Admin       logon
43535   20220203    Admin       logoff
43535   20220204    Admin       logon
43535   20220205    Admin       logon
31249   20220206    Associate   logon
13151   20220206    Associate   logon
15146   20220201    UX          logon
15146   20220201    UX          logoff
15146   20220202    UX          logon
15146   20220202    UX          logoff
15146   20220203    UX          logon
15146   20220203    UX          logoff
15146   20220204    UX          logon
15146   20220205    UX          logoff
15146   20220205    UX          logon

CodePudding user response:

You can assign the result of np.select to new column directly

nunique = df['usernumber'].map(df.groupby('usernumber')['date'].nunique())

df['UsageType'] = np.select(
    [
        nunique.between(0, 3, inclusive=True),
        nunique.between(3, 4, inclusive=True),
        nunique.ge(5)
    ],
    [
        'Few',
        'Moderate',
        'Many'
    ],
    default='Unknown'
)
print(df)

    usernumber      date UsageType
0        12314  20220201       Few
1        12314  20220202       Few
2        12314  20220203       Few
3        32423  20220201  Moderate
4        32423  20220202  Moderate
5        32423  20220203  Moderate
6        32423  20220204  Moderate
7        43535  20220201      Many
8        43535  20220202      Many
9        43535  20220203      Many
10       43535  20220204      Many
11       43535  20220205      Many
  • Related