Home > Back-end >  How can we search for a few strings in a column and multiply another column by a constant?
How can we search for a few strings in a column and multiply another column by a constant?

Time:12-22

I thought I could search for a string in a column, and if the result is found, multiply a value in another column by a string, like this.

df_merged['MaintCost'] = df_merged.loc[df_merged['Code_Description'].str.contains('03 Tree','17 Tree'), 'AvgTotal_OH_Miles'] * 15
df_merged['MaintCost'] = df_merged.loc[df_merged['Code_Description'].str.contains('26 Vines'), 'AvgTotal_OH_Miles'] * 5
df_merged['MaintCost'] = df_merged.loc[df_merged['Code_Description'].str.contains('overgrown primary', 'Tree fails'), 'AvgTotal_OH_Miles'] * 12

This can't be working because I have a string like this '03 Tree' in the column named 'Code_Description' and in 'MaintCost' I have NAN. What am I missing here?

Here's an example to illustrate the point. I am using slightly different names for the dataframe and column names.

data = [{'Month': '2020-01-01', 'Expense':1000, 'Revenue':-50000, 'Building':'03 Tree'}, 
       {'Month': '2020-02-01', 'Expense':3000, 'Revenue':40000, 'Building':'17 Tree'},
       {'Month': '2020-03-01', 'Expense':7000, 'Revenue':50000, 'Building':'Tree fails'}, 
       {'Month': '2020-04-01', 'Expense':3000, 'Revenue':40000, 'Building':'overgrown primary'},
       {'Month': '2020-01-01', 'Expense':5000, 'Revenue':-6000, 'Building':'Tree fails'}, 
       {'Month': '2020-02-01', 'Expense':5000, 'Revenue':4000, 'Building':'26 Vines'},
       {'Month': '2020-03-01', 'Expense':5000, 'Revenue':9000, 'Building':'26 Vines'},
       {'Month': '2020-04-01', 'Expense':6000, 'Revenue':10000, 'Building':'Tree fails'}]
df = pd.DataFrame(data)
df

df['MaintCost'] = df.loc[df['Building'].str.contains('03 Tree','17 Tree'), 'Expense'] * 15
df['MaintCost'] = df.loc[df['Building'].str.contains('26 Vines'), 'Expense'] * 5
df['MaintCost'] = df.loc[df['Building'].str.contains('overgrown primary', 'Tree fails'), 'Expense'] * 12
df['MaintCost'] = df.loc[df['Building'].str.contains('Tree fails'), 'Expense'] * 10

df['MaintCost'] = df['MaintCost'].fillna(100)

df

Result:

enter image description here

For one thing, I would expect to see 15000 in row zero but I am getting 100 because row zero is coming back as a NAN!

CodePudding user response:

we can try a bit different approach:

pats = {'03 Tree':15,
        '17 Tree':15,
        '26 Vines':5,
        'overgrown primary':12,
        'Tree fails':10}

df['MaintCost'] = df.apply(lambda x: x.Expense * pats.get(x.Building,0), axis=1)
print(df)
'''
        Month  Expense  Revenue           Building  MaintCost
0  2020-01-01     1000   -50000            03 Tree      15000
1  2020-02-01     3000    40000            17 Tree      45000
2  2020-03-01     7000    50000         Tree fails      70000
3  2020-04-01     3000    40000  overgrown primary      36000
4  2020-01-01     5000    -6000         Tree fails      50000
5  2020-02-01     5000     4000           26 Vines      25000
6  2020-03-01     5000     9000           26 Vines      25000
7  2020-04-01     6000    10000         Tree fails      60000
  • Related