I have the following data, having two columns, "title name" and "gross" in table called train_df:
gross title name
760507625.0 Avatar (2009)
658672302.0 Titanic (1997)
652270625.0 Jurassic World (2015)
623357910.0 The Avengers (2012)
534858444.0 The Dark Knight (2008)
532177324.0 Rogue One (2016)
474544677.0 Star Wars: Episode I - The Phantom Menace (1999)
459005868.0 Avengers: Age of Ultron (2015)
448139099.0 The Dark Knight Rises (2012)
436471036.0 Shrek 2 (2004)
424668047.0 The Hunger Games: Catching Fire (2013)
423315812.0 Pirates of the Caribbean: Dead Man's Chest (2006)
415004880.0 Toy Story 3 (2010)
409013994.0 Iron Man 3 (2013)
408084349.0 Captain America: Civil War (2016)
408010692.0 The Hunger Games (2012)
403706375.0 Spider-Man (2002)
402453882.0 Jurassic Park (1993)
402111870.0 Transformers: Revenge of the Fallen (2009)
400738009.0 Frozen (2013)
381011219.0 Harry Potter and the Deathly Hallows: Part 2 (2011)
380843261.0 Finding Nemo (2003)
380262555.0 Star Wars: Episode III - Revenge of the Sith (2005)
373585825.0 Spider-Man 2 (2004)
370782930.0 The Passion of the Christ (2004)
I would like to remove the date from "title name". Output should look as follows:
gross title name
760507625.0 Avatar
658672302.0 Titanic
652270625.0 Jurassic World
623357910.0 The Avengers
534858444.0 The Dark Knight
Ignore the gross column as it needs no changing.
CodePudding user response:
Using str.replace
we can try:
train_df["title name"] = train_df["title name"].str.replace(r'\s \(\d{4}\)$', '', regex=True)
CodePudding user response:
Another solution, without re
and only using .str.rsplit()
:
df['title name'] = df['title name'].str.rsplit(' (', n=1).str[0]
print(df)
Prints:
gross title name
0 760507625.0 Avatar
1 658672302.0 Titanic
2 652270625.0 Jurassic World
3 623357910.0 The Avengers
4 534858444.0 The Dark Knight
5 532177324.0 Rogue One
6 474544677.0 Star Wars: Episode I - The Phantom Menace
7 459005868.0 Avengers: Age of Ultron
8 448139099.0 The Dark Knight Rises
9 436471036.0 Shrek 2
10 424668047.0 The Hunger Games: Catching Fire
11 423315812.0 Pirates of the Caribbean: Dead Man's Chest
12 415004880.0 Toy Story 3
13 409013994.0 Iron Man 3
14 408084349.0 Captain America: Civil War
15 408010692.0 The Hunger Games
16 403706375.0 Spider-Man
17 402453882.0 Jurassic Park
18 402111870.0 Transformers: Revenge of the Fallen
19 400738009.0 Frozen
20 381011219.0 Harry Potter and the Deathly Hallows: Part 2
21 380843261.0 Finding Nemo
22 380262555.0 Star Wars: Episode III - Revenge of the Sith
23 373585825.0 Spider-Man 2
24 370782930.0 The Passion of the Christ