Home > database >  python with selenium erase all my excel data each time and I just have the last table page table
python with selenium erase all my excel data each time and I just have the last table page table

Time:07-14

I want to retrieve data from the below website, I want to get all the pages table information and put them in the excel form. It goes through all pages but each time it erase the excel sheet and renew them for writing. at the end I just have the last page not the total pages table. would you help me?

site: https://www150.statcan.gc.ca/n1/pub/71-607-x/2021004/imp-eng.htm?r1=(1)&r2=0&r3=0&r4=12&r5=0&r7=0&r8=2022-01-01&r9=2022-01-01

    from selenium import webdriver
import pandas as pd
import time
driver = webdriver.Chrome('C:\Webdriver\chromedriver.exe')
driver.get('https://www150.statcan.gc.ca/n1/pub/71-607-x/2021004/imp-eng.htm?r1=(1)&r2=0&r3=0&r4=12&r5=0&r7=0&r8=2022-01-01&r9=2022-01-01')
time.sleep(2)

for J in range (20):
    commodities = driver.find_elements_by_xpath('.//*[@id="report_table"]/tbody/tr["i"]/td[2]/a')
    Countries = driver.find_elements_by_xpath('.//*[@id="report_table"]/tbody/tr["i"]/td[4]')
    quantities = driver.find_elements_by_xpath('.//*[@id="report_table"]/tbody/tr["i"]/td[7]')
    weights = driver.find_elements_by_xpath('.//*[@id="report_table"]/tbody/tr["i"]/td[8]/abbr')

    Canada_Result=[]
    for i in range(25):
        temporary_data= {'Commodity': commodities[i].text,'Country': Countries[i].text,'quantity': quantities[i].text, 'weight': weights[i].text }
        Canada_Result.append(temporary_data)
    df_data = pd.DataFrame(Canada_Result)
    df_data
    df_data.to_excel('Canada_scrapping_result.xlsx', index=False)
    # click on the Next button    
    driver.find_element_by_xpath('//*[@id="report_results_next"]').click()
    time.sleep(1)

CodePudding user response:

I would suggest before entering running the code, create an excel with the desired name and column names, after that do not create a DataFrame, but instead read the excel.

Your problem is being caused because everytime the loop repeats, its deleting the previous data and replacing the already existing excel.

so instead of

df = pd.DataFrame(Canada_Result)

I would recommend just reading the excel with

df = pd.read_excel('Canada_scrapping_result.xlsx')

CodePudding user response:

You have to initialize the Canada_Result list before entering the outer for loop, then just append new data, and when the outer loop is ended convert the list to a dataframe and export it to a file.

from selenium import webdriver
import pandas as pd
import time
driver = webdriver.Chrome('C:\Webdriver\chromedriver.exe')
driver.get('https://www150.statcan.gc.ca/n1/pub/71-607-x/2021004/imp-eng.htm?r1=(1)&r2=0&r3=0&r4=12&r5=0&r7=0&r8=2022-01-01&r9=2022-01-01')
time.sleep(2)
Canada_Result = []

for J in range (20):
    commodities = driver.find_elements_by_xpath('.//*[@id="report_table"]/tbody/tr["i"]/td[2]/a')
    Countries = driver.find_elements_by_xpath('.//*[@id="report_table"]/tbody/tr["i"]/td[4]')
    quantities = driver.find_elements_by_xpath('.//*[@id="report_table"]/tbody/tr["i"]/td[7]')
    weights = driver.find_elements_by_xpath('.//*[@id="report_table"]/tbody/tr["i"]/td[8]/abbr')

    for i in range(25):
        temporary_data = {'Commodity': commodities[i].text,'Country': Countries[i].text,'quantity': quantities[i].text, 'weight': weights[i].text }
        Canada_Result.append(temporary_data)
    # click on the Next button    
    driver.find_element_by_xpath('//*[@id="report_results_next"]').click()
    time.sleep(1)

df_data = pd.DataFrame(Canada_Result)
df_data.to_excel('Canada_scrapping_result.xlsx', index=False)
  • Related