Home > OS >  Appending DataFrames to an empty dataframe fails. Only shows the last DataFrame
Appending DataFrames to an empty dataframe fails. Only shows the last DataFrame

Time:12-16

I am trying to run the following and it is running successfully

import pandas as pd
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from datetime import date, datetime

currentDay = datetime.now().day
currentMonth = datetime.now().month
currentYear = datetime.now().year
date = f'{currentYear}-{currentMonth}-{currentDay}'

url = f'https://www.racingandsports.com.au/form-guide/GenerateRaceGuide?discipline=thoroughbred&country=south-korea&course=busan&date={date}&meetingId=291056&cols=[{"name":"HTab","title":"Tab","type":"ND","size":"0.75"},{"name":"FormFigs5","title":"Form","type":"ND","size":"0.9375"},{"name":"HName","title":"Horse","type":"S","size":"2.25"},{"name":"HBP","title":"BP","type":"ND","size":"0.75"},{"name":"Jockey","title":"Jockey","type":"S","size":"2.25"},{"name":"Trainer","title":"Trainer","type":"S","size":"2.25"},{"name":"HWeight","title":"Wt","type":"D","size":"0.75"}]&addCols=["prizemoney"]&fs=S&page=port&preview=true'

chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-dev-shm-usage')

driver = webdriver.Chrome('chromedriver', options=chrome_options)

driver.get(url)

appended_data = pd.DataFrame()

for i in range(1,10):
  data = driver.find_element(By.XPATH, f'//*[@id="parent"]/div[{i}]/div[2]/div/table').get_attribute('outerHTML')
  data_hdf = pd.read_html((data))[0]
  #listt = data_hdf.values.tolist()
  appended=appended_data.append(data_hdf)

appended

Now instead of getting all 9 tables I am only getting the last one even though it is appended as shown here: enter image description here

Please help me out.

CodePudding user response:

The main problem is that you are creating a new appended DataFrame each time in the for loop, this results in overwriting the previously created DataFrame which contains the last step data. If you want to concatenate each DataFrame created using red_html() inside the loop you should change your DataFrames' names like in the last line.

Also use concat() instead of append() since it is deprecated. (https://pandas.pydata.org/docs/reference/api/pandas.concat.html#pandas.concat)

appended_data = pd.DataFrame()

for i in range(1,10):
    data = driver.find_element(By.XPATH, f'//*[@id="parent"]/div[{i}                       /div[2]/div/table').get_attribute('outerHTML')
    data_hdf = pd.read_html((data))[0]
 
    appended_data = pd.concat([appended_data, data_hdf])

CodePudding user response:

appended_data = []

for i in range(1,10):
    data= driver.find_element(By.XPATH, f'//*[@id="parent"]/div[{i}]/div[2]/div/table').get_attribute('outerHTML'),
    data_hdf = {"data_hdf": pd.read_html((data))[0]}
    appended_data.append(data_hdf)
appended_data_pd = pd.DataFrame(appended_data)

Appending single items to a dataframe is a bad and slow idea. Its not made for that.

CodePudding user response:

It looks like the issue you are experiencing is that you are only seeing the last table in the appended DataFrame, even though you are appending data from all nine tables. This could be happening because you are overwriting the appended_data DataFrame on each iteration of the loop, rather than adding to it.

To fix this issue, you can try modifying the loop like this:

data = []

for i in range(1,10):
  html = driver.find_element(By.XPATH, f'//*[@id="parent"]/div[{i}]/div[2]/div/table').get_attribute('outerHTML')
  df = pd.read_html((html))[0]
  data.extend(df.to_dict('records'))

appended_data = pd.DataFrame(data)

This will create an empty DataFrame at the beginning of the loop and then append the data from each table to it on each iteration. This should result in a single DataFrame containing all of the data from the nine tables.

  • Related