In the DataFrame below, I need to check the shop column and add two columns based on shop
datetime | dtnum | time | tnum | shop |
---|---|---|---|---|
02-03-2022 05:25 | 20220302052500 | 05:25:00 | 52500 | PRESS1 |
02-03-2022 05:26 | 20220302052600 | 05:26:00 | 52600 | BODY1 |
02-03-2022 05:27 | 20220302052700 | 05:27:00 | 52700 | BODY2 |
I tried with below code. that is using same return to make two columns
# Apply shop_groupcode
def shop_chk(column):
if column['shop'] == 'PRESS1' or 'PRESS1_A' or 'E176' or 'E177' or 'E184' or 'E185' or 'E186' or 'E187':
return 1, 1 # shop code , plant code
if column['shop'] == 'BODY1' or 'BODY1_A' or 'E179' or 'E180' or 'E181' or 'E208' or \
'E216' or 'E217' or 'E218' or 'E232':
return 2, 1 # shop code , plant code
if column['shop'] == 'BODY2' or 'BODY2_A' or 'E196' or 'E197' or 'E198' or 'E199' or 'E200':
return 8, 2 # shop code , plant code
df['shop_code', 'plant_code'] = df.apply(shop_chk, axis=1, result_type="expand")
Code runs without error, two columns are created- but the column values are all 1
Requirement: I want to know if there is mistake in code or if there is any efficient method as I have some more shop conditions to check
datetime | dtnum | time | tnum | shop | shop_code | plant_code |
---|---|---|---|---|---|---|
02-03-2022 05:25 | 20220302052500 | 05:25:00 | 52500 | PRESS1 | 1 | 1 |
02-03-2022 05:26 | 20220302052600 | 05:26:00 | 52600 | BODY1 | 2 | 1 |
02-03-2022 05:27 | 20220302052700 | 05:27:00 | 52700 | BODY2 | 8 | 2 |
CodePudding user response:
You can use isin
instead of multiple ORs and store those conditions in a list and use numpy.select
:
import numpy as np
conditions = [df['shop'].isin(['PRESS1','PRESS1_A','E176','E177','E184','E185','E186','E187']),
df['shop'].isin(['BODY1','BODY1_A','E179','E180','E181','E208','E216','E217','E218','E232']),
df['shop'].isin(['BODY2','BODY2_A','E196','E197','E198','E199','E200'])
]
df['shop_code'] = np.select(conditions, [1, 2, 8])
df['plant_code'] = np.select(conditions, [1, 1, 2])
Output:
datetime dtnum time tnum shop shop_code plant_code
0 02-03-2022 05:25 20220302052500 05:25:00 52500 PRESS1 1 1
1 02-03-2022 05:26 20220302052600 05:26:00 52600 BODY1 2 1
2 02-03-2022 05:27 20220302052700 05:27:00 52700 BODY2 8 2
FYI, the correct syntax is:
(column['shop'] == 'PRESS1') or (column['shop'] == 'PRESS1_A') or ...
because
column['shop'] == 'PRESS1' or 'PRESS1_A' or ...
returns either True or 'PRESS1_A' (not a truth-value).
CodePudding user response:
You need to use isin
in combination with providing a list of values:
if column['shop'].isin(['PRESS1', 'PRESS1_A', 'E176', 'E177', 'E184', 'E185', 'E186', 'E187']):
otherwise it will always stick on the first if-statement, because the part after the or
is always True and hence will always give 1 as a result.