Home > Net >  str.contains not working when there is not a space between the word and special character
str.contains not working when there is not a space between the word and special character

Time:11-26

I have a dataframe which includes the names of movie titles and TV Series.

From specific keywords I want to classify each row as Movie or Title according to these key words. However, due to brackets not having a space between the key words they are not being picked up by the str.contains() funtion and I need to do a workaround.

This is my dataframe:

import pandas as pd
import numpy as np

watched_df = pd.DataFrame([['Love Death Robots (Episode 1)'], 
                   ['James Bond'],
                   ['How I met your Mother (Avnsitt 3)'], 
                   ['random name'],
                   ['Random movie 3 Episode 8383893']], 
                  columns=['Title'])
watched_df.head()

To add the column that classifies the titles as TV series or Movies I have the following code.

watched_df["temporary_brackets_removed_title"] = watched_df['Title'].str.replace('(', '')
watched_df["Film_Type"] = np.where(watched_df.temporary_brackets_removed_title.astype(str).str.contains(pat = 'Episode | Avnsitt', case = False), 'Series', 'Movie')
watched_df = watched_df.drop('temporary_brackets_removed_title', 1)
watched_df.head()

Is there a simpler way to solve this without having to add and drop a column?

Maybe a str.contains-like function that does not look at a string being the exact same but just containing the given word? Similar to how in SQL you have the "Like" functionality?

CodePudding user response:

You can use str.contains and then map the results:

watched_df['Film_Type'] = watched_df['Title'].str.contains(r'(?:Episode|Avnsitt)').map({True: 'Series', False: 'Movie'})

Output:

>>> watched_df
                               Title Film_Type
0      Love Death Robots (Episode 1)    Series
1                         James Bond     Movie
2  How I met your Mother (Avnsitt 3)    Series
3                        random name     Movie
4     Random movie 3 Episode 8383893    Series
  • Related