Home > Enterprise >  remove a specific character from each value of a list python
remove a specific character from each value of a list python

Time:07-28

I have this list of movies, and I want to remove the dot "." from every title.

I can't just remove the first character of every value, becasue not all of them start with a dot "."

   ['Sueños de fuga(1994)',
     'El padrino(1972)',
     'Citizen Kane(1941)',
     '12 hombres en pugna(1957)',
     'La lista de Schindler(1993)',
     'Lo bueno, lo malo y lo feo(1966)',
     'El imperio contraataca(1980)',
     'El señor de los anillos: El retorno del rey(2003)',
     'Batman - El caballero de la noche(2008)',
     '.El padrino II(1974)',
     '.Tiempos violentos(1994)',
     '.El club de la pelea(1999)',
     '.Psicosis(1960)',
    '.2001: Odisea del espacio(1968)',
    '.Metropolis(1927)',
    '.La guerra de las galaxias(1977)',
     ]

Also, the list is being scrapped, so just manually removing the dot won't work.

Here is the code i have so far:

from bs4 import BeautifulSoup
import requests
import pandas as pd
url = "https://www.imdb.com/list/ls024149810/"
page = requests.get(url)
soup = BeautifulSoup(page.content, "html.parser")
# scrap movie names
scraped_movies = soup.find_all('h3', class_='lister-item-header')

# parse movie names
movies = []
for movie in scraped_movies:
    movie = movie.get_text().replace('\n', "")
    movie = movie.strip(" ")
    movies.append(movie)

# remove the first two characters of each value on the list
movies = [e[2:] for e in movies]  

# remove the remaining dots "."
while (movies.count(".")):
    movies.remove(".")

# print list
print (movies)

CodePudding user response:

Try to remove dot using replace method

movie = movie.get_text().replace('\n', "").replace('.', "")

CodePudding user response:

That should be a very simple matter with a list comprehension. If you take your movie list you can simply replace the dot with nothing. This code with both replace the dotted start of your movies and append them to your movies list.

movies = [x.replace('.', '') for x in scraped_movies]

Output:

['Sueños de fuga(1994)', 'El padrino(1972)', 'Citizen Kane(1941)', '12 hombres en pugna(1957)', 'La lista de Schindler(1993)', 'Lo bueno, lo malo y lo feo(1966)', 'El imperio contraataca(1980)', 'El señor de los anillos: El retorno del rey(2003)', 'Batman - El caballero de la noche(2008)', 'El padrino II(1974)', 'Tiempos violentos(1994)', 'El club de la pelea(1999)', 'Psicosis(1960)', '2001: Odisea del espacio(1968)', 'Metropolis(1927)', 'La guerra de las galaxias(1977)']

If there are cases where you worry about the dot being other places within the title than the start, then you can run an if statement for string.startswith('.') to match more accurately.

CodePudding user response:

You can try this:

# remove the remaining dots "."
for word in movies:
    if word.startswith("."):
        movies[movies.index(word)] = word.replace(".", "")

Or use this, it will find and replace dot if any element starts with a dot and ignores others if not start with a dot and it also works when the list does not contain element that start with a dot.

# remove the remaining dots "."    
movies = [word.replace(".", "") for word in movies if not all(word.startswith(".") for word in movies)]

Edited code:

from bs4 import BeautifulSoup
import requests
import pandas as pd

url = "https://www.imdb.com/list/ls024149810/"
page = requests.get(url)
soup = BeautifulSoup(page.content, "html.parser")

# scrap movie names
scraped_movies = soup.find_all('h3', class_='lister-item-header')

# parse movie names
movies = []
for movie in scraped_movies:
    movie = movie.get_text().replace('\n', "")
    movie = movie.strip(" ")
    movies.append(movie)

# remove the first two characters of each value on the list
movies = [e[2:] for e in movies]
print(movies)

# remove the remaining dots "."
movies = [word.replace(".", "") for word in movies if not all(word.startswith(".") for word in movies)]

# print list
print (movies)

Output:

['The Shawshank Redemption(1994)', 'The Godfather(1972)', 'Citizen Kane(1941)', '12 Angry Men(1957)', "Schindler's List(1993)", 'Il buono, il brutto, il cattivo(1966)', 'The Empire Strikes Back(1980)', 'The Lord of the Rings: The Return of the King(2003)', 'The Dark Knight(2008)', '.The Godfather Part II(1974)', '.Pulp Fiction(1994)', '.Fight Club(1999)', '.Psycho(1960)', '.2001: A Space Odyssey(1968)', '.Metropolis(1927)', '.Star Wars(1977)', '.The Lord of the Rings: The Fellowship of the Ring(2001)', '.Terminator 2: Judgment Day(1991)', '.The Matrix(1999)', '.Raiders of the Lost Ark(1981)', '.Casablanca(1942)', '.The Wizard of Oz(1939)', '.Shichinin no samurai(1954)', '.Forrest Gump(1994)', '.Inception(2010)']
['The Shawshank Redemption(1994)', 'The Godfather(1972)', 'Citizen Kane(1941)', '12 Angry Men(1957)', "Schindler's List(1993)", 'Il buono, il brutto, il cattivo(1966)', 'The Empire Strikes Back(1980)', 'The Lord of the Rings: The Return of the King(2003)', 'The Dark Knight(2008)', 'The Godfather Part II(1974)', 'Pulp Fiction(1994)', 'Fight Club(1999)', 'Psycho(1960)', '2001: A Space Odyssey(1968)', 'Metropolis(1927)', 'Star Wars(1977)', 'The Lord of the Rings: The Fellowship of the Ring(2001)', 'Terminator 2: Judgment Day(1991)', 'The Matrix(1999)', 'Raiders of the Lost Ark(1981)', 'Casablanca(1942)', 'The Wizard of Oz(1939)', 'Shichinin no samurai(1954)', 'Forrest Gump(1994)', 'Inception(2010)']
  • Related