pandas name x is not defined-CodePudding

def new_calculate(x, formulas):
    return pd.Series((eval(formula)) for formula in formulas)

col = {
    's1':{'l':100, 'w':200}, 
    's2':{'l':200, 'w':400}, 
    's3':{'l':300, 'w':500}
    }

coldf = pd.DataFrame.from_dict(col, orient='index')

cols = ['new_a', 'new_p']
formulas = ["x['l'] x['w']", "x['l'] x['w'] x['l'] x['w']"]

coldf[cols] = coldf.apply(lambda x : new_calculate(x, formulas), axis=1)

I am getting an error saying NameError: name 'x' is not defined

I am trying to produce a resulting dataframe with additional columns.

      l    w   new_a  new_p
s1  100  200   20000   600
s2  200  400   80000  1200
s3  300  500  150000  1600

What is wrong? Can itertuples be used or any other way to do it in an efficient way?

I am trying to follow these examples Add Multiple Columns to Pandas Dataframe from Function, Merge dataframe with another dataframe created from apply function?

CodePudding user response：

You should avoid using python's eval..

Rather use pandas' eval:

coldf = pd.DataFrame.from_dict(col, orient='index')

cols = ['new_a', 'new_p']
formulas = ["l w", "l w l w"]

eval_str = '\n'.join(map('='.join, zip(cols,formulas)))
# 'new_a=l w\nnew_p=l w l w'

coldf = coldf.eval(eval_str)

Output:

      l    w  new_a  new_p
s1  100  200    300    600
s2  200  400    600   1200
s3  300  500    800   1600

CodePudding user response：

I'd suggest you avoid eval as in other answer. If you must, you need to identify your x as a local variable:

def new_calculate(x, formulas):
    # notice the dictionary
    return pd.Series((eval(formula, None, {'x':x})) for formula in formulas)

# this now can run
coldf[cols] = coldf.apply(lambda x : new_calculate(x, formulas), axis=1)

and gives you the expected output.

CodePudding user response：

Consider rewriting using assign instead of eval.

import pandas as pd

col = {"s1": {"l": 100, "w": 200}, "s2": {"l": 200, "w": 400}, "s3": {"l": 300, "w": 500}}

df = pd.DataFrame.from_dict(col, orient="index")

df = df.assign(
    new_p=df.l   df.w,
    new_a=df.l   df.w   df.l   df.w,
)

Output:

      l    w  new_p  new_a
s1  100  200    300    600
s2  200  400    600   1200
s3  300  500    800   1600

Uses vectorization, so it's faster compared to apply approaches above:

10.6 ns ± 0.0192 ns per loop (mean ± std. dev. of 7 runs, 100000000 loops each)

This approach from the posts above:

coldf[cols] = coldf.apply(lambda x: new_calculate(x, formulas), axis=1)

Is slower by a factor of thousands:

460 µs ± 1.05 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

We can gain significant performance by avoiding eval / apply and rewriting using assign and vectorization. The result is probably more readable too.

If we want to stick with evaluating strings for the formulas in assign we can do this with comparable performance:

columns = ["new_p", "new_a"]
formulas = ["df.l   df.w", "df.l   df.w   df.l   df.w"]

formula_mapping = dict(zip(columns, formulas))

df = df.assign(**{k: pd.eval(v) for k, v in formula_mapping.items()})

Output:

      l    w  new_p  new_a
s1  100  200    300    600
s2  200  400    600   1200
s3  300  500    800   1600

Timeit:

10.8 ns ± 0.0546 ns per loop (mean ± std. dev. of 7 runs, 100000000 loops each)