I have a pandas dataframe, and I want to replace certain strings in one column. The string could be something like this: "Spiderman is Nr 1" and I want to turn it to "Spiderman (Nr 1)" The only part of the string that stays the same is "is Nr". The superhero and the number change, but not every superhero has a number to them. So the dataframe could look like this:
Superheros
Spiderman is Nr 1
Batman is Nr 4
Joker
Iron Man is Nr 2
Hulk
Captain America
Wonderwoman is Nr 3
And I want to change this Dataframe such that all is Nr \d are changed to (Nr \d):
Superheros
Spiderman (Nr 1)
Batman (Nr 4)
Joker
Iron Man (Nr 2)
Hulk
Captain America
Wonderwoman (Nr 3)
I found that I can replace strings in one column like this:
df["Superheros"] = df["Superheros"].str.replace('is Nr', '(Nr')
But this obviously is missing the final bracket.
I would like to use regex, but I don't know how to access the string in the columns. I think the pattern should be something like r'is Nr \d', but I don't know how to pass the number to the replacing string.
I tried
df["Superheros"] = df["Superheros"].str.replace(r'is Nr \d', r'(Nr \d)')
df["Superheros"] = df["Superheros"].str.re.sub(r'is Nr \d', r'(Nr \d)')
but I get errors, because this is apparently not how to use regex on a column.
I hope it is clear what I am looking for. If you need any more info, let me know. I know there is a lot of regex things here on stackoverflow, but I didn't find the combination of things I am looking for.
CodePudding user response:
You can use
df["Superheros"] = df["Superheros"].str.replace(r'\bis\s (Nr\s*\d )', r'(\1)', regex=True)
See the regex demo
Details
\b
- a word boundaryis
- a wordis
\s
- one or more whitespaces(Nr\s*\d )
- Capturing group 1 (\1
in the replacement pattern refers to this group value):Nr
, zero or more whitespaces (\s*
), and one or more digits (\d
).
Note the use of regex=True
to avoid any warnings.