str.contains not working when there is not a space between the word and special character-CodePudding

I have a dataframe which includes the names of movie titles and TV Series.

From specific keywords I want to classify each row as Movie or Title according to these key words. However, due to brackets not having a space between the key words they are not being picked up by the str.contains() funtion and I need to do a workaround.

This is my dataframe:

import pandas as pd
import numpy as np

watched_df = pd.DataFrame([['Love Death Robots (Episode 1)'], 
                   ['James Bond'],
                   ['How I met your Mother (Avnsitt 3)'], 
                   ['random name'],
                   ['Random movie 3 Episode 8383893']], 
                  columns=['Title'])
watched_df.head()

To add the column that classifies the titles as TV series or Movies I have the following code.

watched_df["temporary_brackets_removed_title"] = watched_df['Title'].str.replace('(', '')
watched_df["Film_Type"] = np.where(watched_df.temporary_brackets_removed_title.astype(str).str.contains(pat = 'Episode | Avnsitt', case = False), 'Series', 'Movie')
watched_df = watched_df.drop('temporary_brackets_removed_title', 1)
watched_df.head()

Is there a simpler way to solve this without having to add and drop a column?

Maybe a str.contains-like function that does not look at a string being the exact same but just containing the given word? Similar to how in SQL you have the "Like" functionality?

CodePudding user response：

You can use str.contains and then map the results:

watched_df['Film_Type'] = watched_df['Title'].str.contains(r'(?:Episode|Avnsitt)').map({True: 'Series', False: 'Movie'})

Output:

>>> watched_df
                               Title Film_Type
0      Love Death Robots (Episode 1)    Series
1                         James Bond     Movie
2  How I met your Mother (Avnsitt 3)    Series
3                        random name     Movie
4     Random movie 3 Episode 8383893    Series