Home > Mobile >  Python BeautifulSoup - scraping multiple pages and export result to CVS
Python BeautifulSoup - scraping multiple pages and export result to CVS

Time:10-10

I want to scrape some information in different pages. The below code can help me to scrape the information with print() Function.

The problem is that I only get the data from the last page. The result of the previous pages cannot be write to CSV file. What should I do? Thanks.

The code:

enter code here
import requests
from csv import writer
from bs4 import BeautifulSoup

urls = ['https://www.xxxxxxxxxxxxxxx/02-nb.php','https://www.xxxxxxxxxxxxxxx/03-np.php','https://www.xxxxxxxxxxxxxxx/04-nb.php']

for index,url in enumerate(urls):
    requests.get(url)
    page = requests.get(url)
    soup = BeautifulSoup(page.text, 'lxml')
    print(soup)
    table_data = soup.find('table')

with open("words.csv", "wt",newline='',encoding='utf-8') as csv_file:
    csv_data = writer(csv_file, delimiter =',')
    for voc in table_data.find_all('tr'):
        row_data = voc.find_all('td')
        row = [tr.text for tr in row_data]
        csv_data.writerow(row)

CodePudding user response:

You're iterating through every URL, but the logic you wrote to write the data to a CSV is outside of that for loop, so it's only writing that last bit of data to the file. I believe what you want is:

for index,url in enumerate(urls):
    requests.get(url)
    page = requests.get(url)
    soup = BeautifulSoup(page.text, 'lxml')
    print(soup)
    table_data = soup.find('table')
    
    if index != 0:
        with open("words.csv", "a",newline='',encoding='utf-8') as csv_file:
            csv_data = writer(csv_file, delimiter =',')
            for voc in table_data.find_all('tr'):
                row_data = voc.find_all('td')
                row = [tr.text for tr in row_data]
                csv_data.writerow(row)
    else:
        with open("words.csv", "wt",newline='',encoding='utf-8') as csv_file:
            csv_data = writer(csv_file, delimiter =',')
            for voc in table_data.find_all('tr'):
                row_data = voc.find_all('td')
                row = [tr.text for tr in row_data]
                csv_data.writerow(row)

This will write the words.csv in each iteration through urls, instead of iterating through all of urls and writing words.csv on the last iteration.

CodePudding user response:

with open("words.csv", "a",newline='',encoding='utf-8') as csv_file:
    csv_data = writer(csv_file, delimiter =',')
    for voc in table_data.find_all('tr'):
        row_data = voc.find_all('td')
        row = [tr.text for tr in row_data]
        csv_data.writerow(row)

This block of code should be indented to the right to be executed on each iteration. Also note open mode should be "a" which stands for "append" in "w" mode you overwriting file every time

  • Related