Home > Software design >  Explanation of pandas DataFrame.assign() behaviour using lambda
Explanation of pandas DataFrame.assign() behaviour using lambda

Time:12-29

import pandas as pd
import numpy as np

np.random.seed(99)
rows = 10
df = pd.DataFrame ({'A' : np.random.choice(range(0, 2), rows, replace = True),
                    'B' : np.random.choice(range(0, 2), rows, replace = True)})


def get_C1(row): 
    return row.A   row.B

def get_C2(row): 
    return 'X' if row.A   row.B == 0 else 'Y'

def get_C3(row): 
    is_zero = row.A   row.B
    return "X" if is_zero else "Y"

df = df.assign(C = lambda row: get_C3(row))

Why the get_C2 and get_C3 functions return an error?

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

CodePudding user response:

You're thinking that df.assign, when passed a function, behaves like df.apply with axis=1, which calls the function for each row.

That's incorrect.

Per the docs for df.assign

Where the value is a callable, evaluated on df

That means that the function you pass to assign is called on the whole dataframe instead of each individual row.


So, in your function get_C3, the row parameter is not a row at all. It's a whole dataframe (and should be renamed to df or something else) and so row.A and row.B are two whole columns, rather than single cell values.

Thus, is_zero is a whole column as well, and ... if is_zero ... will not work.

  • Related