Python Pandas - What is Pandas version of replace or append while working with multiple values?-CodePudding

I have a dataframe where I am creating a new data frame using .str.contains. This is working okay, however I am then trying to find data and then add 'NEW' to the front however the way I am doing it is creating 'NEWX|Y|Z" when I want 'NEWX' where it finds X and 'NEWY' where it finds Y etc.

substr = ['X', 'Y', 'Z']
df1 = df[df['short_description'].str.contains('|'.join(substr), na=False)]

Finds the correct rows

But when I try to append NEW to the front

df3['short_description'] = df3['short_description'].str.replace('|'.join(substr),'NEW' '_' '|'.join(substr))

it adds 'NEWX|Y|Z' in front of each one it finds. I understand why it is doing that but I just want NEWX to replace X, NEWY to replace Y etc.

How can I make it only replace like for like?

X / Y and Z aren't always at start of column so that's why I am finding and replacing and not just adding 'NEW' to whole dataframe column

Thanks for any help.

CodePudding user response：

IIUC, use capture group:

s = pd.Series(["Xsomething", "Ythatthing", "WhatZ", "Nothing"])
s.str.replace("(%s)" % "|".join("XYZ"), "NEW\\1", regex=True)

Output:

0    NEWXsomething
1    NEWYthatthing
2         WhatNEWZ
3          Nothing
dtype: object

Capture group works like use what you found.

"(X|Y|Z)" is looking for either of X, Y or Z, and keep it as a first captured group (effect of wrapping with ().)
- Note that if you use multiple (), you can use \\1,\\2,...\\n.
Then NEW\\1 uses this capture group to replace into NEW{capture_group_1}.