I have a dataframe which includes the names of movie titles and TV Series.
From specific keywords I want to classify each row as Movie or Title according to these key words. However, due to brackets not having a space between the key words they are not being picked up by the str.contains()
funtion and I need to do a workaround.
This is my dataframe:
import pandas as pd
import numpy as np
watched_df = pd.DataFrame([['Love Death Robots (Episode 1)'],
['James Bond'],
['How I met your Mother (Avnsitt 3)'],
['random name'],
['Random movie 3 Episode 8383893']],
columns=['Title'])
watched_df.head()
To add the column that classifies the titles as TV series or Movies I have the following code.
watched_df["temporary_brackets_removed_title"] = watched_df['Title'].str.replace('(', '')
watched_df["Film_Type"] = np.where(watched_df.temporary_brackets_removed_title.astype(str).str.contains(pat = 'Episode | Avnsitt', case = False), 'Series', 'Movie')
watched_df = watched_df.drop('temporary_brackets_removed_title', 1)
watched_df.head()
Is there a simpler way to solve this without having to add and drop a column?
Maybe a str.contains
-like function that does not look at a string being the exact same but just containing the given word? Similar to how in SQL you have the "Like" functionality?
CodePudding user response:
You can use str.contains
and then map
the results:
watched_df['Film_Type'] = watched_df['Title'].str.contains(r'(?:Episode|Avnsitt)').map({True: 'Series', False: 'Movie'})
Output:
>>> watched_df
Title Film_Type
0 Love Death Robots (Episode 1) Series
1 James Bond Movie
2 How I met your Mother (Avnsitt 3) Series
3 random name Movie
4 Random movie 3 Episode 8383893 Series