Home > OS >  How to perform the computation of a new column for a dataframe with the calculation provided as a st
How to perform the computation of a new column for a dataframe with the calculation provided as a st

Time:10-13

I have a pandas dataframe like this:

df = pd.DataFrame({'1': [10, 20, 30], 2: [100, 200, 300]})

#    1    2
#0  10  100
#1  20  200
#2  30  300

The goal is to calculate a new column. However, the calculation is provided as a string:

calc = '{1} {2}'

How can I calculate a new column based on the existing columns and the provided calculation?

What I tried:

My initial idea was to use apply on the dataframe and lambda to make the calculation. Before that I would adjust the calculation string accordingly. However, that would make the use of eval necessary:

for i in range(10):
    calc = calc.replace('{' str(i) '}', 'row["' str(i) '"]')
# outputs calc = 'row["1"] row["2"]'

df['new_col'] = df.apply(lambda row: eval(calc), axis=1)
# basically:    df.apply(lambda row: eval('row["1"] row["2"]'), axis=1)

Since I want to avoid eval, I am looking for a different solution.

CodePudding user response:

You could use pandas' eval method, but you would need to remove the curly brackets and you cannot have numerical column names.

One option would be to adapt the string to add a prefix (e.g. col) using a regex:

calc = '{1} {2}'
import re

query = re.sub('{([^}] )}', r'col\1', calc)
# col1 col2

df['new_col'] = df.add_prefix('col').eval(query)

output:

    1    2  new_col
0  10  100      110
1  20  200      220
2  30  300      330
  • Related