Home > database >  Python: Adding values to empty dictionary
Python: Adding values to empty dictionary

Time:08-16

I have scraped a data from website and I would like to save all of data. However, it only saves the last value of the data. I have made an empty dictionary but i'm struggling with adding element in empty dictionary

Here's my code

from bs4 import BeautifulSoup
import requests
import pandas as pd
import numpy

try:
    source = requests.get('https://www.imdb.com/chart/top/')
    source.raise_for_status()

    soup = BeautifulSoup(source.text,'html.parser')


    movies = soup.find('tbody', class_="lister-list").find_all('tr')    
    
data = {}

    for movie in movies: 
        
        name = movie.find('td', class_='titleColumn').a.text
        
        rank = movie.find('td', class_="titleColumn").get_text(strip=True).split('.')[0] 

        year = movie.find('td', class_="titleColumn").span.text.strip('()')

        rating = movie.find('td', class_="ratingColumn imdbRating").strong.text
        
except Exception as e:
    print(e)

print(data)

CodePudding user response:

Close to your goal, simply add the information to your dict and append it with each iteration to a list. So you are able to create a dataframe:

for movie in movies:

    data.append({
        'name': movie.find('td', class_='titleColumn').a.text,
        'rank': movie.find('td', class_="titleColumn").get_text(strip=True).split('.')[0],
        'year': movie.find('td', class_="titleColumn").span.text.strip('()'),
        'rating': movie.find('td', class_="ratingColumn imdbRating").strong.text
    })
Example
from bs4 import BeautifulSoup
import requests
import pandas as pd

source = requests.get('https://www.imdb.com/chart/top/')
source.raise_for_status()

soup = BeautifulSoup(source.text,'html.parser')

movies = soup.find('tbody', class_="lister-list").find_all('tr')
data = []

for movie in movies:

    data.append({
        'name': movie.find('td', class_='titleColumn').a.text,
        'rank': movie.find('td', class_="titleColumn").get_text(strip=True).split('.')[0],
        'year': movie.find('td', class_="titleColumn").span.text.strip('()'),
        'rating': movie.find('td', class_="ratingColumn imdbRating").strong.text
    })

pd.DataFrame(data)

Output

name rank year rating
0 Die Verurteilten 1 1994 9.2
1 Der Pate 2 1972 9.2
2 The Dark Knight 3 2008 9
3 Der Pate 2 4 1974 9
4 Die zwölf Geschworenen 5 1957 8.9

....

CodePudding user response:

you can replace your for loop with this one to add nested dictionaries, so you can find your movie info by name, then what info you wanted from it

for movie in movies:
    
    name = movie.find('td', class_='titleColumn').a.text

    data[name] = {}
    
    rank = movie.find('td', class_="titleColumn").get_text(strip=True).split('.')[0] 

    year = movie.find('td', class_="titleColumn").span.text.strip('()')

    rating = movie.find('td', class_="ratingColumn imdbRating").strong.text

    data[name]["rank"] = rank
    data[name]["year"] = year
    data[name]["rating"] = rating

print(data)

CodePudding user response:

I would suggest you to store the cur movie in data but make the name of the movie as a key

from bs4 import BeautifulSoup
import requests
import pandas as pd
import numpy

try:
    source = requests.get('https://www.imdb.com/chart/top/')
    source.raise_for_status()

    soup = BeautifulSoup(source.text,'html.parser')


    movies = soup.find('tbody', class_="lister-list").find_all('tr')    
    
data = {}

    for movie in movies: 
        
        name = movie.find('td', class_='titleColumn').a.text
        
        rank = movie.find('td', class_="titleColumn").get_text(strip=True).split('.')[0] 

        year = movie.find('td', class_="titleColumn").span.text.strip('()')

        rating = movie.find('td', class_="ratingColumn imdbRating").strong.text
        cur = {
            'name': name,
            'rank': rank,
            'year': year.
            'rating': rating
        }
        # storing the cur movie in data but name of the movie as a key 
        data[name] = cur
        
except Exception as e:
    print(e)

print(data)
  • Related