Home > Mobile >  How to append a new row in Python (BeaufitulSoup) in a CSV for a multi-url scraper?
How to append a new row in Python (BeaufitulSoup) in a CSV for a multi-url scraper?

Time:09-17

I am pretty new to Python and I am testing my first scraper (using some codes I found here and there). I was able to write the CSV with all the info needed, but now I am trying to input more than 1 URL and the script is just writing the last URL I insert in the array, it's like is not appending new URLs but just re-writing on the same first raw.

I looked everywhere and tried a lot of things, but I think I need some help, thanks!

from bs4 import BeautifulSoup
import requests
from csv import writer

urls = ['https://example.com/1', 'https://example.com/2']

for url in urls:
    my_url = requests.get(url)
    html = my_url.content
    soup = BeautifulSoup(html,'html.parser')

    info = []

print (urls)

lists = soup.find_all('div', class_="profile-info-holder")
links = soup.find_all('a', class_="intercept")

with open('multi.csv', 'w', encoding='utf8', newline='') as f:
    thewriter = writer(f)
    header = ['Name', 'Location', 'Link', 'Link2', 'Link3']
    thewriter.writerow(header)

    for list in lists:
        name = list.find('div', class_="profile-name").text
        location = list.find('div', class_="profile-location").text

        social1 = links[0]
        social2 = links[1]
        social3 = links[2]

        info = [name, location, social1.get('href'),social2.get('href'),social3.get('href')]
        thewriter.writerow(info)

CodePudding user response:

Basic approach

  • Open the file in append mode (‘a’).
  • Write cursor points to the end of file.
  • Append ‘\n’ at the end of the file using write() function
  • Append the given line to the file using write() function.
  • Close the file.

--

with open('multi.csv', 'a', encoding='utf8', newline='') as f:

You may have to arange your loops in another way, but without urls it is hard to describe:

from bs4 import BeautifulSoup
import requests
from csv import writer

urls = ['https://example.com/1', 'https://example.com/2']


with open('multi.csv', 'a', encoding='utf8', newline='') as f:
    thewriter = writer(f)
    header = ['Name', 'Location', 'Link', 'Link2', 'Link3']
    thewriter.writerow(header)

    
    for url in urls:
        my_url = requests.get(url)
        html = my_url.content
        soup = BeautifulSoup(html,'html.parser')

        info = []

        lists = soup.find_all('div', class_="profile-info-holder")
        
        for l in lists:
            name = l.find('div', class_="profile-name").text
            location = l.find('div', class_="profile-location").text
            links = l.find_all('a', class_="intercept")
            social1 = links[0]
            social2 = links[1]
            social3 = links[2]

            info = [name, location, social1.get('href'),social2.get('href'),social3.get('href')]
            thewriter.writerow(info)
  • Related