Home > other >  adding data to dataframe issue
adding data to dataframe issue

Time:10-25

I have a code that scrapes a betting site. Here is my code:

from os import pardir
import time
from selenium import webdriver
from bs4 import BeautifulSoup
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options
from urllib.parse import urlparse, parse_qs
import re
import pandas as pd
from selenium.webdriver.remote.webelement import WebElement


driver = webdriver.Chrome('C:/Users/tmarkac/source/repos/chromedriver.exe')
team_name = 'KYVO FC'

u =f'https://superbet.pl/wyszukaj?query={team_name}'


url = driver.get(u)
driver.maximize_window()
time.sleep(1)
driver.find_element(By.XPATH,'//*[@id="onetrust-accept-btn-handler"]').click()
time.sleep(1)
driver.find_element(By.CLASS_NAME,'pick__more-odds').click()
time.sleep(3)

#options = webdriver.ChromeOptions()
#prefs = {
#  "translate_whitelists": {"po":"en"},
#  "translate":{"enabled":"True"}
#}
#options.add_experimental_option('prefs', prefs)
#driver = webdriver.Chrome(chrome_options=options)

expand = driver.find_elements(By.CLASS_NAME,'icon.icon--md.event-row__expanded-market-icon.icon-chevron_down')
df = pd.DataFrame({'Market':[''],'Price1': [''],'Price2': [''],'Price3': [''],'Price4': [''],'Price5': [''],
                   'Price6': [''],'Price7': [''],'Price8': [''],'Price9': [''],'Price10': [''],'Price11': [''],'Price12': [''],
                   'Price1': [''],'Price13': [''],'Price14': [''],'Price15': [''],'Price16': [''],'Price17': ['']})


exp_clicks = 0
for i in expand:
    i.click()
    time.sleep(0.1)
    exp_clicks  =1
time.sleep(1)

market_data = []
market_list = driver.find_elements(By.CLASS_NAME,'event-row__expanded-market')

for market in market_list:
       name_elem = market.find_element(By.CLASS_NAME,'event-row__expanded-market-title')
       name = name_elem.text
       market_data.append(name)
       price_element = market.find_elements(By.CLASS_NAME,'value.new.actionable') 
       for j in price_element:
           price_list = []
           price = j.text
           market_data.append(price)
       df.loc[len(df)] = market_list
       market_list.clear()
print(df)

This line: team_name = 'KYVO FC' is like an input. KYVO FC is just a team name I copy/ pasted from the betting site(any team name from the site can be copy/pasteed in the team_name variable so please do that if you wish to run the code).

The problem is with how I defined my DataFrame and how I am appending data to it. Currently, this is how I defined my DateFrame:

df = pd.DataFrame({'Market':[''],'Price1': [''],'Price2': [''],'Price3': [''],'Price4': [''],'Price5': [''],
                   'Price6': [''],'Price7': [''],'Price8': [''],'Price9': [''],'Price10': [''],'Price11': [''],'Price12': [''],
                   'Price1': [''],'Price13': [''],'Price14': [''],'Price15': [''],'Price16': [''],'Price17': ['']})

which is terrible. Even I with so little skill in coding know this. Code goes to a betting site, finds the game using a query, expands all markets, and scrapes market names and prices(this part gets done fine). The problem is some markets have 2 prices, some have 3 prices and some have more. How can I define my DataFrame correctly when I don't know how many prices will I get for a particular market? In a nutshell, I need a way to append data to DataFrame in the following way(will translate market names to English):

enter image description here

It would be great if I knew how to add the data into data frame so each row would contain data for one market. Thank you.

CodePudding user response:

I'll post just the relevant part of the code as the rest is the same as in the original post.

try:
    exp_clicks = 0
    for i in expand:
        i.click()
        time.sleep(0.2)
        exp_clicks  =1
    time.sleep(1)

    market_data = []
    market_list = driver.find_elements(By.CLASS_NAME,'event-row__expanded-market')

    for market in market_list:
           counter = 0
           name_elem = market.find_element(By.CLASS_NAME,'event-row__expanded-market-title')
           name = name_elem.text
           market_data.append(name)
           price_element = market.find_elements(By.CLASS_NAME,'value.new.actionable') 
           for j in price_element:
               price_list = []
               price = j.text
               market_data.append(price)
           market_df = pd.DataFrame(np.array(market_data).reshape(-1,len(market_data)))
           df = pd.concat([df,market_df], ignore_index = True, axis = 0).replace(np.nan,'')
           market_data.clear()
           counter  = 1
except:
    pass

I just created Dataframe out of a list and reshaped it to display the data as desired.

market_df = pd.DataFrame(np.array(market_data).reshape(-1,len(market_data)))

Then I concatenated the new Datframe with the old one and set NaN values to be replaced with empty strings.

  df = pd.concat([df,market_df], ignore_index = True, axis = 0).replace(np.nan,'')
  • Related