Home > front end >  Create new data frame columns with return in function
Create new data frame columns with return in function

Time:03-03

In the DataFrame below, I need to check the shop column and add two columns based on shop

datetime dtnum time tnum shop
02-03-2022 05:25 20220302052500 05:25:00 52500 PRESS1
02-03-2022 05:26 20220302052600 05:26:00 52600 BODY1
02-03-2022 05:27 20220302052700 05:27:00 52700 BODY2

I tried with below code. that is using same return to make two columns

# Apply shop_groupcode
def shop_chk(column):
    if column['shop'] == 'PRESS1' or 'PRESS1_A' or 'E176' or 'E177' or 'E184' or 'E185' or 'E186' or 'E187':
        return 1, 1  # shop code , plant code
    if column['shop'] == 'BODY1' or 'BODY1_A' or 'E179' or 'E180' or 'E181' or 'E208' or \
            'E216' or 'E217' or 'E218' or 'E232':
        return 2, 1 # shop code , plant code
    if column['shop'] == 'BODY2' or 'BODY2_A' or 'E196' or 'E197' or 'E198' or 'E199' or 'E200':
        return 8, 2 # shop code , plant code

df['shop_code', 'plant_code'] = df.apply(shop_chk, axis=1, result_type="expand")

Code runs without error, two columns are created- but the column values are all 1

Requirement: I want to know if there is mistake in code or if there is any efficient method as I have some more shop conditions to check

datetime dtnum time tnum shop shop_code plant_code
02-03-2022 05:25 20220302052500 05:25:00 52500 PRESS1 1 1
02-03-2022 05:26 20220302052600 05:26:00 52600 BODY1 2 1
02-03-2022 05:27 20220302052700 05:27:00 52700 BODY2 8 2

CodePudding user response:

You can use isin instead of multiple ORs and store those conditions in a list and use numpy.select:

import numpy as np
conditions = [df['shop'].isin(['PRESS1','PRESS1_A','E176','E177','E184','E185','E186','E187']), 
              df['shop'].isin(['BODY1','BODY1_A','E179','E180','E181','E208','E216','E217','E218','E232']),
              df['shop'].isin(['BODY2','BODY2_A','E196','E197','E198','E199','E200'])
             ]
df['shop_code'] = np.select(conditions, [1, 2, 8])
df['plant_code'] = np.select(conditions, [1, 1, 2])

Output:

           datetime           dtnum      time   tnum    shop  shop_code  plant_code  
0  02-03-2022 05:25  20220302052500  05:25:00  52500  PRESS1          1           1  
1  02-03-2022 05:26  20220302052600  05:26:00  52600   BODY1          2           1
2  02-03-2022 05:27  20220302052700  05:27:00  52700   BODY2          8           2

FYI, the correct syntax is:

(column['shop'] == 'PRESS1') or (column['shop'] == 'PRESS1_A') or ...

because

column['shop'] == 'PRESS1' or 'PRESS1_A' or ...

returns either True or 'PRESS1_A' (not a truth-value).

CodePudding user response:

You need to use isin in combination with providing a list of values:

    if column['shop'].isin(['PRESS1', 'PRESS1_A', 'E176', 'E177', 'E184', 'E185', 'E186', 'E187']):

otherwise it will always stick on the first if-statement, because the part after the or is always True and hence will always give 1 as a result.

  • Related