I have successfully scrapped several websites individually. However, now I want to have a single script so that I don't have to run each script individually all the time. I would like to build a for loop that goes through all websites and replaces the x with a string. Unfortunately, there are no numbers, with which I could go through the individual pages with "for x in range", but there are just the strings mentioned.
Here is my current code:
from bs4 import BeautifulSoup
import requests
import pandas as pd
movielist = []
for x in ... ('action', 'comedy', 'thriller', 'drama', 'sport'): # what should i insert instead of ...?
r = requests.get(f'https://movie.com/{x}', headers=headers)
soup = BeautifulSoup(r.text, 'html.parser')
spiele = soup.find_all('div', {'class': 'row'})
The site is not real, its just a question how to do that.
I am very happy about your help, thank you very much.
CodePudding user response:
Just remove the ... Your tuple is iterable to you can go through every element like this:
for x in ('action', 'comedy', 'thriller', 'drama', 'sport'): # what should i insert instead of ...?
print(f'https://movie.com/{x}')
output:
https://movie.com/action
https://movie.com/comedy
https://movie.com/thriller
https://movie.com/drama
https://movie.com/sport
CodePudding user response:
You were pretty close! Just iterate through movie list tuple and x will be next element for each iteration.
from bs4 import BeautifulSoup
import requests
import pandas as pd
movielist = ('action', 'comedy', 'thriller', 'drama', 'sport')
for x in movielist:
print(f'https://movie.com/{x}')
r = requests.get(f'https://movie.com/{x}', headers=headers)
soup = BeautifulSoup(r.text, 'html.parser')
spiele = soup.find_all('div', {'class': 'row'})
With this slight change it will make requests, use BS and print to constructed urls to console:
https://movie.com/action
https://movie.com/comedy
https://movie.com/thriller
https://movie.com/drama
https://movie.com/sport
However, you probably should store result after each iteration somewhere, for example append "spiele" to some list or just use it somehow.