I have a datafile as follows:
Activity | Hazard | Condition | consequence |
---|---|---|---|
welding | fire | chamber | high |
painting | falling | none | medium |
I need to create a fifth column based on the values in the activity, hazard, condition, and consequence columns. The conditions are as follows
if the activity column includes "working" or "performing" return 'none' if the condition column includes 'none' return 'none' else return datafile.Hazard.map(str) " is " datafile.Condition.map(str) " impact " datafile.consequence.map(str)
I wrote the following code using regular expressions and dictionaries. But it didn't provide an answer. It is really appreciated if someone can give an answer
Dt1 = ['working','performing duties']
Dt2 = ["none"]
Dt1_regex = "|".join(Dt1)
Dt2_regex = "|".join(Dt2)
def clean(x):
if datafile.Activity.str.contains (Dt1_regex, regex=True) | datafile.Condition.str.contains(Dt2_regex, regex=True):
return 'none'
else:
return datafile.Hazard.map(str) " is " datafile.Condition.map(str)" impact " datafile.consequence.map(str)
datafile['combined'] = datafile.apply(clean)
CodePudding user response:
You can create a bunch of conditions, and a list of the associated values when the conditions are true
. You can then create the new column as shown in the example below:
Conditions:
- When
Activity == 'working'
ORActivity == 'performing'
- set tonone
- When
Condition == 'none'
- set tonone
Otherwise set to:
df.Hazard
' is '
df.Condition
' impact '
df.consequence
Code:
import numpy as np
import pandas as pd
df = pd.DataFrame({ 'Activity': ['welding', 'painting', 'working'],
'Hazard': ['fire', 'falling', 'drowning'],
'Condition': ['chamber', 'none', 'wet'],
'consequence': ['high', 'medium', 'high']})
# Create your conditions
conditions = [
(df['Activity'] == 'working') | (df['Activity'] == 'performing'),
(df['Condition'] == 'none')
]
# create a list of the values we want to assign for each condition
values = ['none', 'none']
# Create the new column based on the conditions and values
df['combined'] = np.select(conditions, values, default = "x")
df.loc[df['combined'] == 'x', 'combined'] = df.Hazard ' is ' df.Condition " impact " df.consequence
print(df)