Home > database >  Storing values from webpage with selenium
Storing values from webpage with selenium

Time:10-28

I want to create a dictionary that stores the values from one class as keys, and another class as it's values from the webpage I'm working on.

Here's what I have tried:

from bs4 import BeautifulSoup
import pandas as pd
from selenium import webdriver
DRIVER_PATH = '/usr/local/bin/chromedriver'
driver = webdriver.Chrome(executable_path=DRIVER_PATH)

all_data = []
for i in range(1, 5, 1):
    url = 'https://www.transfermarkt.co.uk/cristiano-ronaldo/profil/spieler/' str(i)
    driver.get(url)
    html = driver.page_source
    soup = BeautifulSoup(html, 'html.parser')
    data = {}
    market_left = soup.find('div', {'class':'right-td'})
    market_right = soup.find('div', {'class':'left-td'})
for m in market_left:
    for mr in market_right:
        print(data[m.text.strip()].append(mr.text.strip()))

However I get the following error:

AttributeError: 'NavigableString' object has no attribute 'text'

Also when I increase the number in the range so say for example from range(1, 10, 1), It doesn't seem to iterate over many pages, it only selects the last one. Any idea on how it can grab the information for each page within the loop?

Expected output :

{'Current market value:':[-, -, -,-,-]}

CodePudding user response:

Can use zip in python to iterate over two list simultaneously.

Try like below:

for i in range(1,10):
    url = "https://www.transfermarkt.co.uk/silvio-adzic/profil/spieler/{}".format(i)
    driver.get(url)
    time.sleep(5)
    soup = BeautifulSoup(driver.page_source, 'html5lib')
    market_left = soup.find_all('div',class_="left-td")
    market_right = soup.find_all('div',class_="right-td")
    print(f"In Page {i}")
    for l,r in zip(market_left,market_right):
        l_value = l.text.replace('\n','').replace(' ','')
        r_value = r.text.replace('\n','').replace(' ','')
        print(f"{l_value} {r_value}")
        # Code to add the details to dictionary.
    print("-----------------------------------------------------------")

And few of the pages dint have the data you are looking for.

In Page 1
Currentmarketvalue: -
Lastupdate: Apr23,2009
Highestmarketvalue:Lastupdate: £225Th.Oct4,2004
-----------------------------------------------------------
In Page 2
-----------------------------------------------------------
In Page 3
-----------------------------------------------------------
In Page 4
Currentmarketvalue: -
Lastupdate: Feb13,2007
Highestmarketvalue:Lastupdate: £360Th.Oct4,2004
-----------------------------------------------------------
In Page 5
Currentmarketvalue: -
Lastupdate: Sep14,2010
Highestmarketvalue:Lastupdate: £1.26mOct4,2004
-----------------------------------------------------------
In Page 6
Currentmarketvalue: -
Lastupdate: Aug2,2010
Highestmarketvalue:Lastupdate: £765Th.Oct6,2005
-----------------------------------------------------------
In Page 7
Currentmarketvalue: -
Lastupdate: Jan30,2014
Highestmarketvalue:Lastupdate: £1.08mOct4,2004
-----------------------------------------------------------
In Page 8
Currentmarketvalue: -
Lastupdate: Jan2,2010
Highestmarketvalue:Lastupdate: £1.35mJun2,2006
-----------------------------------------------------------
In Page 9
-----------------------------------------------------------
  • Related