Home > other >  Web scraping using Python
Web scraping using Python

Time:12-19

I'm trying to get data from a list of companies (currently testing only for one) from a website. I am not sure I can recognise how to get the score that I want because I can only find the formatting part instead of the actual data. Please could someone help?

from selenium import webdriver
import time
from selenium.webdriver.support.select import Select

driver=webdriver.Chrome(executable_path='C:\webdrivers\chromedriver.exe')

driver.get('https://www.refinitiv.com/en/sustainable-finance/esg-scores')

driver.maximize_window()
time.sleep(1)

cookie= driver.find_element("xpath", '//button[@id="onetrust-accept-btn-handler"]')
try:
    cookie.click()
except:
    pass

company_name=driver.find_element("id",'searchInput-1')
company_name.click()
company_name.send_keys('Jumbo SA')
time.sleep(1)

search=driver.find_element("xpath", '//button[@]')
search.click()
time.sleep(2)

company_score = driver.find_elements("xpath",'//div[@]')

print(company_score)

That's what I have so far. I want the number "42" to come back to my results but instead I get the below;

[<selenium.webdriver.remote.webelement.WebElement (session="bffa2fe80dd3785618b5c52d7087096d", element="62eaf2a8-d1a2-4741-8374-c0f970dfcbfe")>]

My issue is that the locator is not working.

//div[@] = This part I think is wrong - but I am not sure what I need to pick from the website.

Website Screenshot

CodePudding user response:

please use requests look at this example:

import requests

url = "https://www.refinitiv.com/bin/esg/esgsearchsuggestions"

payload = ""
response = requests.request("GET", url, data=payload)

print(response.text)

so this returns something like this:

[
{
        "companyName": "GEK TERNA Holdings Real Estate Construction SA",
        "ricCode": "HRMr.AT"
    },
    {
        "companyName": "Mytilineos SA",
        "ricCode": "MYTr.AT"
    },
    {
        "companyName": "Hellenic Telecommunications Organization SA",
        "ricCode": "OTEr.AT"
    },
    {
        "companyName": "Jumbo SA",
        "ricCode": "BABr.AT"
    },
    {
        "companyName": "Folli Follie Commercial Manufacturing and Technical SA",
        "ricCode": "HDFr.AT"
    },
    {
]

Here we can see the text and the code behind it so for Jumbo SA its BABr.AT. Now with this info lets get the data:

import requests

url = "https://www.refinitiv.com/bin/esg/esgsearchresult"

querystring = {"ricCode":"BABr.AT"} ## supply the company code

payload = ""
headers = {"cookie": "encaddr=NeVecfNa7/R1rLeYOqY57g=="}

response = requests.request("GET", url, data=payload, headers=headers, params=querystring)

print(response.text)

Now we see the response is in json:

{
    "industryComparison": {
        "industryType": "Specialty Retailers",
        "scoreYear": "2020",
        "rank": "162",
        "totalIndustries": "281"
    },
    "esgScore": {
        "TR.TRESGCommunity": {
            "score": 24,
            "weight": 0.13
        },
        "TR.TRESGInnovation": {
            "score": 9,
            "weight": 0.05
        },
        "TR.TRESGHumanRights": {
            "score": 31,
            "weight": 0.08
        },
        "TR.TRESGShareholders": {
            "score": 98,
            "weight": 0.08
        },
        "TR.SocialPillar": {
            "score": 43,
            "weight": 0.42999998
        },
        "TR.TRESGEmissions": {
            "score": 19,
            "weight": 0.08
        },
        "TR.TRESGManagement": {
            "score": 47,
            "weight": 0.26
        },
        "TR.GovernancePillar": {
            "score": 53,
            "weight": 0.38999998569488525
        },
        "TR.TRESG": {
            "score": 42,
            "weight": 1
        },
        "TR.TRESGWorkforce": {
            "score": 52,
            "weight": 0.1
        },
        "TR.EnvironmentPillar": {
            "score": 20,
            "weight": 0.19
        },
        "TR.TRESGResourceUse": {
            "score": 30,
            "weight": 0.06
        },
        "TR.TRESGProductResponsibility": {
            "score": 62,
            "weight": 0.12
        },
        "TR.TRESGCSRStrategy": {
            "score": 17,
            "weight": 0.05
        }
    }
}

Now you can get the data you want without using selenium. This way its faster, easier and better.

Please accept this as an answer.

CodePudding user response:

Thank you so much for your help. I'm sorry I was not that clear in my question I think (I'm quite new to this).

My issue is that the locator is not working.

//div[@] = This part I think is wrong - but I am not sure what I need to pick from the website.

Website Screenshot

  • Related