I am pretty new to Python and I am testing my first scraper (using some codes I found here and there). I was able to write the CSV with all the info needed, but now I am trying to input more than 1 URL and the script is just writing the last URL I insert in the array, it's like is not appending new URLs but just re-writing on the same first raw.
I looked everywhere and tried a lot of things, but I think I need some help, thanks!
from bs4 import BeautifulSoup
import requests
from csv import writer
urls = ['https://example.com/1', 'https://example.com/2']
for url in urls:
my_url = requests.get(url)
html = my_url.content
soup = BeautifulSoup(html,'html.parser')
info = []
print (urls)
lists = soup.find_all('div', class_="profile-info-holder")
links = soup.find_all('a', class_="intercept")
with open('multi.csv', 'w', encoding='utf8', newline='') as f:
thewriter = writer(f)
header = ['Name', 'Location', 'Link', 'Link2', 'Link3']
thewriter.writerow(header)
for list in lists:
name = list.find('div', class_="profile-name").text
location = list.find('div', class_="profile-location").text
social1 = links[0]
social2 = links[1]
social3 = links[2]
info = [name, location, social1.get('href'),social2.get('href'),social3.get('href')]
thewriter.writerow(info)
CodePudding user response:
Basic approach
- Open the file in append mode (‘a’).
- Write cursor points to the end of file.
- Append ‘\n’ at the end of the file using write() function
- Append the given line to the file using write() function.
- Close the file.
--
with open('multi.csv', 'a', encoding='utf8', newline='') as f:
You may have to arange your loops in another way, but without urls
it is hard to describe:
from bs4 import BeautifulSoup
import requests
from csv import writer
urls = ['https://example.com/1', 'https://example.com/2']
with open('multi.csv', 'a', encoding='utf8', newline='') as f:
thewriter = writer(f)
header = ['Name', 'Location', 'Link', 'Link2', 'Link3']
thewriter.writerow(header)
for url in urls:
my_url = requests.get(url)
html = my_url.content
soup = BeautifulSoup(html,'html.parser')
info = []
lists = soup.find_all('div', class_="profile-info-holder")
for l in lists:
name = l.find('div', class_="profile-name").text
location = l.find('div', class_="profile-location").text
links = l.find_all('a', class_="intercept")
social1 = links[0]
social2 = links[1]
social3 = links[2]
info = [name, location, social1.get('href'),social2.get('href'),social3.get('href')]
thewriter.writerow(info)