Home > OS >  Webscraping: Loop through multiple urls
Webscraping: Loop through multiple urls

Time:09-23

I have successfully scrapped several websites individually. However, now I want to have a single script so that I don't have to run each script individually all the time. I would like to build a for loop that goes through all websites and replaces the x with a string. Unfortunately, there are no numbers, with which I could go through the individual pages with "for x in range", but there are just the strings mentioned.

Here is my current code:

from bs4 import BeautifulSoup
import requests
import pandas as pd
    

movielist = []

for x in ... ('action', 'comedy', 'thriller', 'drama', 'sport'): # what should i insert instead of ...?
    r = requests.get(f'https://movie.com/{x}', headers=headers)
    soup = BeautifulSoup(r.text, 'html.parser')
    spiele = soup.find_all('div', {'class': 'row'})

The site is not real, its just a question how to do that.

I am very happy about your help, thank you very much.

CodePudding user response:

Just remove the ... Your tuple is iterable to you can go through every element like this:

for x in ('action', 'comedy', 'thriller', 'drama', 'sport'): # what should i insert instead of ...?
    print(f'https://movie.com/{x}')

output:

https://movie.com/action
https://movie.com/comedy
https://movie.com/thriller
https://movie.com/drama
https://movie.com/sport

CodePudding user response:

You were pretty close! Just iterate through movie list tuple and x will be next element for each iteration.

from bs4 import BeautifulSoup
import requests
import pandas as pd

movielist = ('action', 'comedy', 'thriller', 'drama', 'sport')

for x in movielist:
    print(f'https://movie.com/{x}')
    r = requests.get(f'https://movie.com/{x}', headers=headers)
    soup = BeautifulSoup(r.text, 'html.parser')
    spiele = soup.find_all('div', {'class': 'row'})

With this slight change it will make requests, use BS and print to constructed urls to console:

https://movie.com/action
https://movie.com/comedy
https://movie.com/thriller
https://movie.com/drama
https://movie.com/sport

However, you probably should store result after each iteration somewhere, for example append "spiele" to some list or just use it somehow.

  • Related