def new_calculate(x, formulas):
return pd.Series((eval(formula)) for formula in formulas)
col = {
's1':{'l':100, 'w':200},
's2':{'l':200, 'w':400},
's3':{'l':300, 'w':500}
}
coldf = pd.DataFrame.from_dict(col, orient='index')
cols = ['new_a', 'new_p']
formulas = ["x['l'] x['w']", "x['l'] x['w'] x['l'] x['w']"]
coldf[cols] = coldf.apply(lambda x : new_calculate(x, formulas), axis=1)
I am getting an error saying NameError: name 'x' is not defined
I am trying to produce a resulting dataframe with additional columns.
l w new_a new_p
s1 100 200 20000 600
s2 200 400 80000 1200
s3 300 500 150000 1600
What is wrong? Can itertuples be used or any other way to do it in an efficient way?
I am trying to follow these examples Add Multiple Columns to Pandas Dataframe from Function, Merge dataframe with another dataframe created from apply function?
CodePudding user response:
You should avoid using python's eval
..
Rather use pandas' eval
:
coldf = pd.DataFrame.from_dict(col, orient='index')
cols = ['new_a', 'new_p']
formulas = ["l w", "l w l w"]
eval_str = '\n'.join(map('='.join, zip(cols,formulas)))
# 'new_a=l w\nnew_p=l w l w'
coldf = coldf.eval(eval_str)
Output:
l w new_a new_p
s1 100 200 300 600
s2 200 400 600 1200
s3 300 500 800 1600
CodePudding user response:
I'd suggest you avoid eval
as in other answer. If you must, you need to identify your x
as a local variable:
def new_calculate(x, formulas):
# notice the dictionary
return pd.Series((eval(formula, None, {'x':x})) for formula in formulas)
# this now can run
coldf[cols] = coldf.apply(lambda x : new_calculate(x, formulas), axis=1)
and gives you the expected output.
CodePudding user response:
Consider rewriting using assign instead of eval.
import pandas as pd
col = {"s1": {"l": 100, "w": 200}, "s2": {"l": 200, "w": 400}, "s3": {"l": 300, "w": 500}}
df = pd.DataFrame.from_dict(col, orient="index")
df = df.assign(
new_p=df.l df.w,
new_a=df.l df.w df.l df.w,
)
Output:
l w new_p new_a
s1 100 200 300 600
s2 200 400 600 1200
s3 300 500 800 1600
Uses vectorization, so it's faster compared to apply approaches above:
10.6 ns ± 0.0192 ns per loop (mean ± std. dev. of 7 runs, 100000000 loops each)
This approach from the posts above:
coldf[cols] = coldf.apply(lambda x: new_calculate(x, formulas), axis=1)
Is slower by a factor of thousands:
460 µs ± 1.05 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
We can gain significant performance by avoiding eval / apply and rewriting using assign and vectorization. The result is probably more readable too.
If we want to stick with evaluating strings for the formulas in assign we can do this with comparable performance:
columns = ["new_p", "new_a"]
formulas = ["df.l df.w", "df.l df.w df.l df.w"]
formula_mapping = dict(zip(columns, formulas))
df = df.assign(**{k: pd.eval(v) for k, v in formula_mapping.items()})
Output:
l w new_p new_a
s1 100 200 300 600
s2 200 400 600 1200
s3 300 500 800 1600
Timeit:
10.8 ns ± 0.0546 ns per loop (mean ± std. dev. of 7 runs, 100000000 loops each)