I'm trying to get data from a list of companies (currently testing only for one) from a website. I am not sure I can recognise how to get the score that I want because I can only find the formatting part instead of the actual data. Please could someone help?
from selenium import webdriver
import time
from selenium.webdriver.support.select import Select
driver=webdriver.Chrome(executable_path='C:\webdrivers\chromedriver.exe')
driver.get('https://www.refinitiv.com/en/sustainable-finance/esg-scores')
driver.maximize_window()
time.sleep(1)
cookie= driver.find_element("xpath", '//button[@id="onetrust-accept-btn-handler"]')
try:
cookie.click()
except:
pass
company_name=driver.find_element("id",'searchInput-1')
company_name.click()
company_name.send_keys('Jumbo SA')
time.sleep(1)
search=driver.find_element("xpath", '//button[@]')
search.click()
time.sleep(2)
company_score = driver.find_elements("xpath",'//div[@]')
print(company_score)
That's what I have so far. I want the number "42" to come back to my results but instead I get the below;
[<selenium.webdriver.remote.webelement.WebElement (session="bffa2fe80dd3785618b5c52d7087096d", element="62eaf2a8-d1a2-4741-8374-c0f970dfcbfe")>]
My issue is that the locator is not working.
//div[@] = This part I think is wrong - but I am not sure what I need to pick from the website.
CodePudding user response:
please use requests look at this example:
import requests
url = "https://www.refinitiv.com/bin/esg/esgsearchsuggestions"
payload = ""
response = requests.request("GET", url, data=payload)
print(response.text)
so this returns something like this:
[
{
"companyName": "GEK TERNA Holdings Real Estate Construction SA",
"ricCode": "HRMr.AT"
},
{
"companyName": "Mytilineos SA",
"ricCode": "MYTr.AT"
},
{
"companyName": "Hellenic Telecommunications Organization SA",
"ricCode": "OTEr.AT"
},
{
"companyName": "Jumbo SA",
"ricCode": "BABr.AT"
},
{
"companyName": "Folli Follie Commercial Manufacturing and Technical SA",
"ricCode": "HDFr.AT"
},
{
]
Here we can see the text and the code behind it so for Jumbo SA its BABr.AT. Now with this info lets get the data:
import requests
url = "https://www.refinitiv.com/bin/esg/esgsearchresult"
querystring = {"ricCode":"BABr.AT"} ## supply the company code
payload = ""
headers = {"cookie": "encaddr=NeVecfNa7/R1rLeYOqY57g=="}
response = requests.request("GET", url, data=payload, headers=headers, params=querystring)
print(response.text)
Now we see the response is in json:
{
"industryComparison": {
"industryType": "Specialty Retailers",
"scoreYear": "2020",
"rank": "162",
"totalIndustries": "281"
},
"esgScore": {
"TR.TRESGCommunity": {
"score": 24,
"weight": 0.13
},
"TR.TRESGInnovation": {
"score": 9,
"weight": 0.05
},
"TR.TRESGHumanRights": {
"score": 31,
"weight": 0.08
},
"TR.TRESGShareholders": {
"score": 98,
"weight": 0.08
},
"TR.SocialPillar": {
"score": 43,
"weight": 0.42999998
},
"TR.TRESGEmissions": {
"score": 19,
"weight": 0.08
},
"TR.TRESGManagement": {
"score": 47,
"weight": 0.26
},
"TR.GovernancePillar": {
"score": 53,
"weight": 0.38999998569488525
},
"TR.TRESG": {
"score": 42,
"weight": 1
},
"TR.TRESGWorkforce": {
"score": 52,
"weight": 0.1
},
"TR.EnvironmentPillar": {
"score": 20,
"weight": 0.19
},
"TR.TRESGResourceUse": {
"score": 30,
"weight": 0.06
},
"TR.TRESGProductResponsibility": {
"score": 62,
"weight": 0.12
},
"TR.TRESGCSRStrategy": {
"score": 17,
"weight": 0.05
}
}
}
Now you can get the data you want without using selenium. This way its faster, easier and better.
Please accept this as an answer.
CodePudding user response:
Thank you so much for your help. I'm sorry I was not that clear in my question I think (I'm quite new to this).
My issue is that the locator is not working.
//div[@] = This part I think is wrong - but I am not sure what I need to pick from the website.