I have a pandas dataframe like this:
df = pd.DataFrame({'1': [10, 20, 30], 2: [100, 200, 300]})
# 1 2
#0 10 100
#1 20 200
#2 30 300
The goal is to calculate a new column. However, the calculation is provided as a string:
calc = '{1} {2}'
How can I calculate a new column based on the existing columns and the provided calculation?
What I tried:
My initial idea was to use apply
on the dataframe and lambda
to make the calculation. Before that I would adjust the calculation string accordingly. However, that would make the use of eval
necessary:
for i in range(10):
calc = calc.replace('{' str(i) '}', 'row["' str(i) '"]')
# outputs calc = 'row["1"] row["2"]'
df['new_col'] = df.apply(lambda row: eval(calc), axis=1)
# basically: df.apply(lambda row: eval('row["1"] row["2"]'), axis=1)
Since I want to avoid eval
, I am looking for a different solution.
CodePudding user response:
You could use pandas' eval
method, but you would need to remove the curly brackets and you cannot have numerical column names.
One option would be to adapt the string to add a prefix (e.g. col
) using a regex:
calc = '{1} {2}'
import re
query = re.sub('{([^}] )}', r'col\1', calc)
# col1 col2
df['new_col'] = df.add_prefix('col').eval(query)
output:
1 2 new_col
0 10 100 110
1 20 200 220
2 30 300 330