Home > Software engineering >  How to replace string on pandas dataframe before certain characters
How to replace string on pandas dataframe before certain characters

Time:05-18

Here's my dataset

Id  Text
1   Animation_and_Cartoon - Comics and Anime/Cartoon_and_anime
2   Animation_and_Cartoon - Comics and Anime/Manga_and_anime

Expected output is all _ before - is replaced by ' ', but after - is not

Id  Text
1   Animation_and_Cartoon - Comics and Anime/Cartoon_and_anime
2   Animation_and_Cartoon - Comics and Anime/Manga_and_anime

CodePudding user response:

You can use:

df['Text'] = df['Text'].str.replace(
             r'^([^-] )',
             lambda m: m.group().replace('_and_',' and '),
             regex=True)

Output:

   Id                                                        Text
0   1  Animation and Cartoon - Comics and Anime/Cartoon_and_anime
1   2    Animation and Cartoon - Comics and Anime/Manga_and_anime

CodePudding user response:

# you can replace the underscores using a lookahead
df['Text'] = df['Text'].str.replace('_(?=.*\-)', ' ', regex=True)
'Animation and Cartoon - Comics and Anime/Cartoon_and_anime'
'Animation and Cartoon - Comics and Anime/Manga_and_anime'
  • _: Match an underscore
  • (?=.*-): Lookahead to match zero or more characters followed by a -.
  • Related