Web scraping using Python-CodePudding

I'm trying to get data from a list of companies (currently testing only for one) from a website. I am not sure I can recognise how to get the score that I want because I can only find the formatting part instead of the actual data. Please could someone help?

from selenium import webdriver
import time
from selenium.webdriver.support.select import Select

driver=webdriver.Chrome(executable_path='C:\webdrivers\chromedriver.exe')

driver.get('https://www.refinitiv.com/en/sustainable-finance/esg-scores')

driver.maximize_window()
time.sleep(1)

cookie= driver.find_element("xpath", '//button[@id="onetrust-accept-btn-handler"]')
try:
    cookie.click()
except:
    pass

company_name=driver.find_element("id",'searchInput-1')
company_name.click()
company_name.send_keys('Jumbo SA')
time.sleep(1)

search=driver.find_element("xpath", '//button[@]')
search.click()
time.sleep(2)

company_score = driver.find_elements("xpath",'//div[@]')

print(company_score)

That's what I have so far. I want the number "42" to come back to my results but instead I get the below;

[<selenium.webdriver.remote.webelement.WebElement (session="bffa2fe80dd3785618b5c52d7087096d", element="62eaf2a8-d1a2-4741-8374-c0f970dfcbfe")>]

My issue is that the locator is not working.

//div[@] = This part I think is wrong - but I am not sure what I need to pick from the website.

Website Screenshot

CodePudding user response：

please use requests look at this example:

import requests

url = "https://www.refinitiv.com/bin/esg/esgsearchsuggestions"

payload = ""
response = requests.request("GET", url, data=payload)

print(response.text)

so this returns something like this:

[
{
        "companyName": "GEK TERNA Holdings Real Estate Construction SA",
        "ricCode": "HRMr.AT"
    },
    {
        "companyName": "Mytilineos SA",
        "ricCode": "MYTr.AT"
    },
    {
        "companyName": "Hellenic Telecommunications Organization SA",
        "ricCode": "OTEr.AT"
    },
    {
        "companyName": "Jumbo SA",
        "ricCode": "BABr.AT"
    },
    {
        "companyName": "Folli Follie Commercial Manufacturing and Technical SA",
        "ricCode": "HDFr.AT"
    },
    {
]

Here we can see the text and the code behind it so for Jumbo SA its BABr.AT. Now with this info lets get the data:

import requests

url = "https://www.refinitiv.com/bin/esg/esgsearchresult"

querystring = {"ricCode":"BABr.AT"} ## supply the company code

payload = ""
headers = {"cookie": "encaddr=NeVecfNa7/R1rLeYOqY57g=="}

response = requests.request("GET", url, data=payload, headers=headers, params=querystring)

print(response.text)

Now we see the response is in json:

{
    "industryComparison": {
        "industryType": "Specialty Retailers",
        "scoreYear": "2020",
        "rank": "162",
        "totalIndustries": "281"
    },
    "esgScore": {
        "TR.TRESGCommunity": {
            "score": 24,
            "weight": 0.13
        },
        "TR.TRESGInnovation": {
            "score": 9,
            "weight": 0.05
        },
        "TR.TRESGHumanRights": {
            "score": 31,
            "weight": 0.08
        },
        "TR.TRESGShareholders": {
            "score": 98,
            "weight": 0.08
        },
        "TR.SocialPillar": {
            "score": 43,
            "weight": 0.42999998
        },
        "TR.TRESGEmissions": {
            "score": 19,
            "weight": 0.08
        },
        "TR.TRESGManagement": {
            "score": 47,
            "weight": 0.26
        },
        "TR.GovernancePillar": {
            "score": 53,
            "weight": 0.38999998569488525
        },
        "TR.TRESG": {
            "score": 42,
            "weight": 1
        },
        "TR.TRESGWorkforce": {
            "score": 52,
            "weight": 0.1
        },
        "TR.EnvironmentPillar": {
            "score": 20,
            "weight": 0.19
        },
        "TR.TRESGResourceUse": {
            "score": 30,
            "weight": 0.06
        },
        "TR.TRESGProductResponsibility": {
            "score": 62,
            "weight": 0.12
        },
        "TR.TRESGCSRStrategy": {
            "score": 17,
            "weight": 0.05
        }
    }
}

Now you can get the data you want without using selenium. This way its faster, easier and better.

Please accept this as an answer.

CodePudding user response：

Thank you so much for your help. I'm sorry I was not that clear in my question I think (I'm quite new to this).

My issue is that the locator is not working.

//div[@] = This part I think is wrong - but I am not sure what I need to pick from the website.

Website Screenshot