Home > Back-end >  Extract year from column with string of movie names
Extract year from column with string of movie names

Time:01-15

I have the following data, having two columns, "title name" and "gross" in table called train_df:

gross       title name
760507625.0 Avatar (2009)
658672302.0 Titanic (1997)
652270625.0 Jurassic World (2015)
623357910.0 The Avengers (2012)
534858444.0 The Dark Knight (2008)
532177324.0 Rogue One (2016)
474544677.0 Star Wars: Episode I - The Phantom Menace (1999)
459005868.0 Avengers: Age of Ultron (2015)
448139099.0 The Dark Knight Rises (2012)
436471036.0 Shrek 2 (2004)
424668047.0 The Hunger Games: Catching Fire (2013)
423315812.0 Pirates of the Caribbean: Dead Man's Chest (2006)
415004880.0 Toy Story 3 (2010)
409013994.0 Iron Man 3 (2013)
408084349.0 Captain America: Civil War (2016)
408010692.0 The Hunger Games (2012)
403706375.0 Spider-Man (2002)
402453882.0 Jurassic Park (1993)
402111870.0 Transformers: Revenge of the Fallen (2009)
400738009.0 Frozen (2013)
381011219.0 Harry Potter and the Deathly Hallows: Part 2 (2011)
380843261.0 Finding Nemo (2003)
380262555.0 Star Wars: Episode III - Revenge of the Sith (2005)
373585825.0 Spider-Man 2 (2004)
370782930.0 The Passion of the Christ (2004)

I would like to remove the date from "title name". Output should look as follows:

gross   title name
760507625.0 Avatar
658672302.0 Titanic
652270625.0 Jurassic World
623357910.0 The Avengers
534858444.0 The Dark Knight

Ignore the gross column as it needs no changing.

CodePudding user response:

Using str.replace we can try:

train_df["title name"] = train_df["title name"].str.replace(r'\s \(\d{4}\)$', '', regex=True)

CodePudding user response:

Another solution, without re and only using .str.rsplit():

df['title name'] = df['title name'].str.rsplit(' (', n=1).str[0]
print(df)

Prints:

          gross                                    title name
0   760507625.0                                        Avatar
1   658672302.0                                       Titanic
2   652270625.0                                Jurassic World
3   623357910.0                                  The Avengers
4   534858444.0                               The Dark Knight
5   532177324.0                                     Rogue One
6   474544677.0     Star Wars: Episode I - The Phantom Menace
7   459005868.0                       Avengers: Age of Ultron
8   448139099.0                         The Dark Knight Rises
9   436471036.0                                       Shrek 2
10  424668047.0               The Hunger Games: Catching Fire
11  423315812.0    Pirates of the Caribbean: Dead Man's Chest
12  415004880.0                                   Toy Story 3
13  409013994.0                                    Iron Man 3
14  408084349.0                    Captain America: Civil War
15  408010692.0                              The Hunger Games
16  403706375.0                                    Spider-Man
17  402453882.0                                 Jurassic Park
18  402111870.0           Transformers: Revenge of the Fallen
19  400738009.0                                        Frozen
20  381011219.0  Harry Potter and the Deathly Hallows: Part 2
21  380843261.0                                  Finding Nemo
22  380262555.0  Star Wars: Episode III - Revenge of the Sith
23  373585825.0                                  Spider-Man 2
24  370782930.0                     The Passion of the Christ
  • Related