I have a dataframe like this (simplified):
| | amount | other_amt | rule_id |
|---:|:--------|:----------|---------:|
| 0 | 2 | 0 | 101 |
| 1 | 20 | 0.5 | 102 |
| 2 | 300 | 0 | 0 |
| 3 | 50 | 1 | 101 |
I then have a set of functions that apply each of these rules to the data, such as:
def rule_101(df):
return df['amount'] / 2
def rule_102(df):
return df['other_amt']
I want to create a new column where I apply each rule_xxx(df)
function, depending on what's in the rule_id
column. And I use the content of the rule_id
column to call the function within the command that creates the new column. Something like
df['new_col'] = np.where(df['rule_id'] == '0',
df['amount']),
locals()[f'rule_{df.rule_id}'](df))
This bit f'rule_{df.rule_id}'
is what's causing me trouble. It returns the full series and thus an error, like
KeyError: 'rule_0 0\n1 0\n2 0\n3 0\n4 0\n ..\n495 0\n496 0\n497 0\n498 0\n499 0\nName: rule_id, Length: 500, dtype: object'
How can I "align" these two inputs? So that the value in rule_id
for each row gets inserted in the f-string, thus calling the function for that specific rule_id on that specific row?
Other approaches are also welcome of course, as long as I'm able to apply the function corresponding to the rule_id in each row. Thanks a lot
CodePudding user response:
You can use a dictionary to look up the rules:
def rule_101(df):
return df['amount'] / 2
def rule_102(df):
return df['other_amt']
ruleset = {
0: lambda k: 0,
101: rule_101,
102: rule_102
}
def rules(row):
return ruleset[row['rule_id']](row)
df['new_col'] = df.apply(rules, axis=1)