Home > database >  Why can I not use f-string in the pandas assign method?
Why can I not use f-string in the pandas assign method?

Time:04-30

For instance, I am trying to create new clean columns in the existing dataframe with a regex pattern applied as shown below. I get the SyntaxError that a keyword can't be an expression.

for col in cols2:
    df.assign(f"{col}_clean"=lambda df:df[col].str.replace(r"\(|\)|,", ""))

    df.assign(f"{col}_clean"=lambda df:df[col].str.replace(r"\(|\)|,", ""))
             ^
SyntaxError: keyword can't be an expression

I then tried to assign a list of column names e.g.

    cols2_clean = []
    for col in cols2:
       clean = f"{col}_clean"
       cols2_clean.append(clean)

df.assign(cols2_clean=lambda df:df[cols2].str.replace(r"\(|\)|,", ""))

that didn't work and gave the attribution error AttributeError: 'DataFrame' object has no attribute 'str'. Is my only option to do this manually one by one?

CodePudding user response:

df.assign() takes the column names as keyword arguments. You can't use a string as a keyword argument, it has to be an identifier.

What you can do is pass a dictionary using ** to turn it into keyword arguments.

df = df.assign(**{f"{col}_clean": lambda df:df[col].str.replace(r"\(|\)|,", "")})

CodePudding user response:

@Barmar's answer is correct and exactly what you're trying to do. However, a more idomatic pandas way to do would be to not use a for-loop and instead use apply:

cols2 = ['col A', 'col B', 'col C']
df[pd.Index(cols2)   '_clean'] = df[cols2].apply(lambda col: col.str.replace(r"\(|\)|,", ""))

When you call apply without specifying axis, it'll default to axis=0, which means it'll call the lambda function for each column.

  • Related