I am working on a webscrape code, he work fine, now I want replace the url, with a CSV file who containt thousand of url, it's like this :
url1
url2
url3
.
.
.urlX
my first line web scrape code is a basic :
from bs4 import BeautifulSoup
import requests
from csv import writer
url= "HERE THE URL FROM EACH LINE OF THE CSV FILE"
page = requests.get(url)
soup = BeautifulSoup(page.content, 'html.parser')
how can i do for tell to python, to use the urls from the CSV, i think to do a dico, but i dont very know how i can do that, anyone have a solution please ? i know it's seams very simple for you, but it will be very usefull for me.
CodePudding user response:
If this is just a list of urls, you don't really need the csv
module. But here is a solution assuming the url is in column 0 of the file. You want a csv reader, not writer, and then its a simple case of iterating the rows and taking action.
from bs4 import BeautifulSoup
import requests
import csv
with open("url-collection.csv", newline="") as fileobj:
for row in csv.reader(fileobj):
# TODO: add try/except to handle errors
url = row[0]
page = requests.get(url)
soup = BeautifulSoup(page.content, 'html.parser')