I have a dataframe where I am creating a new data frame using .str.contains. This is working okay, however I am then trying to find data and then add 'NEW' to the front however the way I am doing it is creating 'NEWX|Y|Z" when I want 'NEWX' where it finds X and 'NEWY' where it finds Y etc.
substr = ['X', 'Y', 'Z']
df1 = df[df['short_description'].str.contains('|'.join(substr), na=False)]
Finds the correct rows
But when I try to append NEW to the front
df3['short_description'] = df3['short_description'].str.replace('|'.join(substr),'NEW' '_' '|'.join(substr))
it adds 'NEWX|Y|Z' in front of each one it finds. I understand why it is doing that but I just want NEWX to replace X, NEWY to replace Y etc.
How can I make it only replace like for like?
X / Y and Z aren't always at start of column so that's why I am finding and replacing and not just adding 'NEW' to whole dataframe column
Thanks for any help.
CodePudding user response:
IIUC, use capture group:
s = pd.Series(["Xsomething", "Ythatthing", "WhatZ", "Nothing"])
s.str.replace("(%s)" % "|".join("XYZ"), "NEW\\1", regex=True)
Output:
0 NEWXsomething
1 NEWYthatthing
2 WhatNEWZ
3 Nothing
dtype: object
Capture group works like use what you found.
"(X|Y|Z)"
is looking for either ofX
,Y
orZ
, and keep it as a first captured group (effect of wrapping with()
.)- Note that if you use multiple
()
, you can use\\1
,\\2
,...\\n
.
- Note that if you use multiple
Then
NEW\\1
uses this capture group to replace intoNEW{capture_group_1}
.