Home > database >  Webscraping question on appending several values to a single row with beautifulsoup
Webscraping question on appending several values to a single row with beautifulsoup

Time:11-24

Lets say i want to scrape imdb for top 10 movies. I would like to fetch the title for the movies and the cast members for the movies.

Im easily able to fetch the title of the movies and append them to a list. The problem is i dont know how to append several values to a single row. Let say the first movie has 3 actors, the second movie has 5 actors, how can append the actors to a list so that the 3 actors on in first movie are in row 1 of the list and the 5 actors from the second movie are in row 2 and so on.

CodePudding user response:

Just a general approach, cause there is no code provided in your question.

Request the webiste (example top 250 movies) and cook your soup:

response = requests.get('http://www.imdb.com/chart/top')
soup = BeautifulSoup(response.text, 'lxml')

Create your empty list that should store your results:

data = []

Iterate over the result set of your selection (example top 250 movies) and append a dict per iteration to your list:

for e in soup.select('.titleColumn a'):
    data.append({
        'title':e.text,
        'director':e['title'].split('(dir.),')[0],
        'actors':e['title'].split('(dir.),')[-1]
    })

Print your data or create a data frame from your list of dicts and :

pd.DataFrame(data)

Output

    title               director                actors
0   Die Verurteilten    Frank Darabont          Tim Robbins, Morgan Freeman
1   Der Pate            Francis Ford Coppola    Marlon Brando, Al Pacino
2   Der Pate 2          Francis Ford Coppola    Al Pacino, Robert De Niro
  • Related