Here's my dataset
Id Text
1 Animation_and_Cartoon - Comics and Anime/Cartoon_and_anime
2 Animation_and_Cartoon - Comics and Anime/Manga_and_anime
Expected output is all _
before -
is replaced by ' ', but after -
is not
Id Text
1 Animation_and_Cartoon - Comics and Anime/Cartoon_and_anime
2 Animation_and_Cartoon - Comics and Anime/Manga_and_anime
CodePudding user response:
You can use:
df['Text'] = df['Text'].str.replace(
r'^([^-] )',
lambda m: m.group().replace('_and_',' and '),
regex=True)
Output:
Id Text
0 1 Animation and Cartoon - Comics and Anime/Cartoon_and_anime
1 2 Animation and Cartoon - Comics and Anime/Manga_and_anime
CodePudding user response:
# you can replace the underscores using a lookahead
df['Text'] = df['Text'].str.replace('_(?=.*\-)', ' ', regex=True)
'Animation and Cartoon - Comics and Anime/Cartoon_and_anime'
'Animation and Cartoon - Comics and Anime/Manga_and_anime'
- _: Match an underscore
- (?=.*-): Lookahead to match zero or more characters followed by a -.