I thought I could search for a string in a column, and if the result is found, multiply a value in another column by a string, like this.
df_merged['MaintCost'] = df_merged.loc[df_merged['Code_Description'].str.contains('03 Tree','17 Tree'), 'AvgTotal_OH_Miles'] * 15
df_merged['MaintCost'] = df_merged.loc[df_merged['Code_Description'].str.contains('26 Vines'), 'AvgTotal_OH_Miles'] * 5
df_merged['MaintCost'] = df_merged.loc[df_merged['Code_Description'].str.contains('overgrown primary', 'Tree fails'), 'AvgTotal_OH_Miles'] * 12
This can't be working because I have a string like this '03 Tree' in the column named 'Code_Description' and in 'MaintCost' I have NAN. What am I missing here?
Here's an example to illustrate the point. I am using slightly different names for the dataframe and column names.
data = [{'Month': '2020-01-01', 'Expense':1000, 'Revenue':-50000, 'Building':'03 Tree'},
{'Month': '2020-02-01', 'Expense':3000, 'Revenue':40000, 'Building':'17 Tree'},
{'Month': '2020-03-01', 'Expense':7000, 'Revenue':50000, 'Building':'Tree fails'},
{'Month': '2020-04-01', 'Expense':3000, 'Revenue':40000, 'Building':'overgrown primary'},
{'Month': '2020-01-01', 'Expense':5000, 'Revenue':-6000, 'Building':'Tree fails'},
{'Month': '2020-02-01', 'Expense':5000, 'Revenue':4000, 'Building':'26 Vines'},
{'Month': '2020-03-01', 'Expense':5000, 'Revenue':9000, 'Building':'26 Vines'},
{'Month': '2020-04-01', 'Expense':6000, 'Revenue':10000, 'Building':'Tree fails'}]
df = pd.DataFrame(data)
df
df['MaintCost'] = df.loc[df['Building'].str.contains('03 Tree','17 Tree'), 'Expense'] * 15
df['MaintCost'] = df.loc[df['Building'].str.contains('26 Vines'), 'Expense'] * 5
df['MaintCost'] = df.loc[df['Building'].str.contains('overgrown primary', 'Tree fails'), 'Expense'] * 12
df['MaintCost'] = df.loc[df['Building'].str.contains('Tree fails'), 'Expense'] * 10
df['MaintCost'] = df['MaintCost'].fillna(100)
df
Result:
For one thing, I would expect to see 15000 in row zero but I am getting 100 because row zero is coming back as a NAN!
CodePudding user response:
we can try a bit different approach:
pats = {'03 Tree':15,
'17 Tree':15,
'26 Vines':5,
'overgrown primary':12,
'Tree fails':10}
df['MaintCost'] = df.apply(lambda x: x.Expense * pats.get(x.Building,0), axis=1)
print(df)
'''
Month Expense Revenue Building MaintCost
0 2020-01-01 1000 -50000 03 Tree 15000
1 2020-02-01 3000 40000 17 Tree 45000
2 2020-03-01 7000 50000 Tree fails 70000
3 2020-04-01 3000 40000 overgrown primary 36000
4 2020-01-01 5000 -6000 Tree fails 50000
5 2020-02-01 5000 4000 26 Vines 25000
6 2020-03-01 5000 9000 26 Vines 25000
7 2020-04-01 6000 10000 Tree fails 60000