Home > other >  Why isn't replace working in pandas dataframe?
Why isn't replace working in pandas dataframe?

Time:06-04

I'm trying to parse some data and I cannot seem to use .replace to remove the junk i.e. [Bluray] and other data that is not year or resolution. My end goal is end up with columns of: Movie Name, Year and Resolution

,Movie Name,others
0,James Bond The Spy Who Loved Me ,1977) [1080p]
1,James Bond Live And Let Die ,1973) [1080p]
2,No Time To Die ,2021) [1080p] [BluRay] [5.1] [YTS.MX]
3,James Bond The Man With The Golden Gun ,1974) [1080p]
4,Casino Royale ,2006) [2160p] [4K] [BluRay] [5.1] [YTS.MX]
5,James Bond Moonraker ,1979) [1080p]
6,James Bond Licence To Kill ,1989) [1080p]
7,James Bond A View To A Kill ,1985) [1080p]
8,James Bond The Living Daylights ,1987) [1080p]

Code i'm using is:

df['others']=df['others'].replace(to_replace=[['','BluRay']],value='')

Can anyone see where i'm going wrong?

CodePudding user response:

Given:

                                Movie Name                                      others
0         James Bond The Spy Who Loved Me                                1977) [1080p]
1             James Bond Live And Let Die                                1973) [1080p]
2                          No Time To Die        2021) [1080p] [BluRay] [5.1] [YTS.MX]
3  James Bond The Man With The Golden Gun                                1974) [1080p]
4                           Casino Royale   2006) [2160p] [4K] [BluRay] [5.1] [YTS.MX]
5                    James Bond Moonraker                                1979) [1080p]
6              James Bond Licence To Kill                                1989) [1080p]
7             James Bond A View To A Kill                                1985) [1080p]
8         James Bond The Living Daylights                                1987) [1080p]

To clean this up I would do:

df['Movie Name'] = df['Movie Name'].str.strip()
df[['Year', 'Resolution']] = df['others'].str.extract('(\d{4})\).*\[(.*p)]')

print(df[['Movie Name', 'Year', 'Resolution']])

Output:

                               Movie Name  Year Resolution
0         James Bond The Spy Who Loved Me  1977      1080p
1             James Bond Live And Let Die  1973      1080p
2                          No Time To Die  2021      1080p
3  James Bond The Man With The Golden Gun  1974      1080p
4                           Casino Royale  2006      2160p
5                    James Bond Moonraker  1979      1080p
6              James Bond Licence To Kill  1989      1080p
7             James Bond A View To A Kill  1985      1080p
8         James Bond The Living Daylights  1987      1080p

CodePudding user response:

you can use the regex to replace

df['others']=df['others'].str.replace(r"BluRay|5\.1",'',regex=True) 
    Unnamed: 0  Movie Name  others
0   0   James Bond The Spy Who Loved Me     1977) [1080p]
1   1   James Bond Live And Let Die     1973) [1080p]
2   2   No Time To Die  2021) [1080p] [] [] [YTS.MX]
3   3   James Bond The Man With The Golden Gun  1974) [1080p]
4   4   Casino Royale   2006) [2160p] [4K] [] [] [YTS.MX]
5   5   James Bond Moonraker    1979) [1080p]
6   6   James Bond Licence To Kill  1989) [1080p]
7   7   James Bond A View To A Kill     1985) [1080p]
8   8   James Bond The Living Daylights     1987) [1080p]
  • Related