Home > Software engineering >  Using regex in python to delete (or replace) parentheses and items inside them
Using regex in python to delete (or replace) parentheses and items inside them

Time:11-15

I have a csv file that looks like the following:

Halley Bailey - 1998 
Hayley Orrantia (1994-) American actress, singer, and songwriter 
Ken Watanabe (actor) 
etc...

I’d like to remove the items in the parentheses, as well as the commas in some of the names that have commas, so that the dataframe looks like this:

Halley Bailey
Hayley Orrantia
Ken Watanabe

I attempted using the following code, which succeeds in removing the dates after the name, but not the parentheses or things after commmas, how could I expand it so it can replace all these items?

regex = '|'.join(map(re.escape, df['actors']))

CodePudding user response:

Try with the following '(^[^\(|^\-] )' returning all matches before a - or (:

df['Full Name'] = df['Description'].str.extract('(^[^\(|^\-] )')

Returning:

                                         Description        Full Name
0                               Halley Bailey - 1998    Halley Bailey 
1  Hayley Orrantia (1994-) American actress, sing...  Hayley Orrantia 
2                               Ken Watanabe (actor)     Ken Watanabe 

CodePudding user response:

Assuming that the csv content is in stored in the column csv of the dataframe df, and that df looks like the following (if one doesn't know how to read a CSV into a Pandas Dataframe, see first Notes below)

                                                 csv
0                               Halley Bailey - 1998
1  Hayley Orrantia (1994-) American actress, sing...
2                               Ken Watanabe (actor)

If one wants to create a new column named actors, considering that an actor full name is only composed of 2 words, the following will do the work

df['actors'] = df['csv'].str.split(' ').str[:2].str.join(' ')

[Out]:

                                                 csv           actors
0                               Halley Bailey - 1998    Halley Bailey
1  Hayley Orrantia (1994-) American actress, sing...  Hayley Orrantia
2                               Ken Watanabe (actor)     Ken Watanabe

If, on another hand, one doesn't want to create a new column, one can do the following

df['csv'] = df['csv'].str.split(' ').str[:2].str.join(' ')

[Out]:

               csv
0    Halley Bailey
1  Hayley Orrantia
2     Ken Watanabe

Notes:

  • Related