Home > Enterprise >  Why does my web scraper write everything into a single line
Why does my web scraper write everything into a single line

Time:10-27

Complete newbie but I've managed to successfully scrape EAN numbers with Python from a list of links created by an upstream piece of code. However, my output file contains all the scraped numbers as a continuous single line instead of one EAN per line.

Here's my code - what's wrong with it? (scraped URL redacted)

import requests
from bs4 import BeautifulSoup
import urllib.request
import os

subpage = 1

while subpage <= 2:
    URL = "https://..."   str(subpage)
    page = requests.get(URL)
    soup = BeautifulSoup(page.content, "html.parser")

    """writes all links under the h2 tag into a list"""
    links = []
    h2s = soup.find_all("h2")
    for h2 in h2s:
        links.append("http://www.xxxxxxxxxxx.com"   h2.a['href'])

    """opens links from list and extracts EAN number from underlying page"""
    with open("temp.txt", "a") as output:
        for link in links:
            urllib.request.urlopen(link)
            page_2 = requests.get(link)
            soup_2 = BeautifulSoup(page_2.content, "html.parser")
            if "EAN:" in soup_2.text:
                span = soup_2.find(class_="articleData_ean")
                EAN = span.a.text
                output.write(EAN)
        subpage  = 1

os.replace('temp.txt', 'EANs.txt')

CodePudding user response:

output.write(EAN) is writing each EAN without anything between them. It doesn't automatically add a separator or newline. You can add a newline: output.write('\n') or comma, etc. to separate them

  • Related