For instance, I am trying to create new clean columns in the existing dataframe with a regex pattern applied as shown below. I get the SyntaxError that a keyword can't be an expression.
for col in cols2:
df.assign(f"{col}_clean"=lambda df:df[col].str.replace(r"\(|\)|,", ""))
df.assign(f"{col}_clean"=lambda df:df[col].str.replace(r"\(|\)|,", ""))
^
SyntaxError: keyword can't be an expression
I then tried to assign a list of column names e.g.
cols2_clean = []
for col in cols2:
clean = f"{col}_clean"
cols2_clean.append(clean)
df.assign(cols2_clean=lambda df:df[cols2].str.replace(r"\(|\)|,", ""))
that didn't work and gave the attribution error AttributeError: 'DataFrame' object has no attribute 'str'. Is my only option to do this manually one by one?
CodePudding user response:
df.assign()
takes the column names as keyword arguments. You can't use a string as a keyword argument, it has to be an identifier.
What you can do is pass a dictionary using **
to turn it into keyword arguments.
df = df.assign(**{f"{col}_clean": lambda df:df[col].str.replace(r"\(|\)|,", "")})
CodePudding user response:
@Barmar's answer is correct and exactly what you're trying to do. However, a more idomatic pandas way to do would be to not use a for
-loop and instead use apply
:
cols2 = ['col A', 'col B', 'col C']
df[pd.Index(cols2) '_clean'] = df[cols2].apply(lambda col: col.str.replace(r"\(|\)|,", ""))
When you call apply
without specifying axis
, it'll default to axis=0
, which means it'll call the lambda function for each column.