Insert value in searchbar, select autocomplete result and get value by bs4-CodePudding

I am trying to use Beautiful Soup to read a value from a web page. The following steps are necessary:

go to the webpage: url = 'https://www.msci.com/our-solutions/esg-investing/esg-fund-ratings/funds/'
insert the ISIN in the searchbar

3. select the autocomplete-results from the container msci-ac-search-data-dropdown (click) 4. read the value from the "div class: ratingdata-outercircle esgratings-profile-header-green" to get the text: "ratingdata-fund-rating esg-fund-ratings-circle-aaa".

so far i have tried the following:


import requests
from bs4 import BeautifulSoup

isin = 'IE00B4L5Y983'

url = 'https://www.msci.com/our-solutions/esg-investing/esg-fund-ratings/funds/'
soup = BeautifulSoup( requests.get(url).content, 'html.parser' )

payload = {}
for i in soup.select('form[action="https://www.msci.com/search"] input[value]'):
    payload[i['name']] = i['value']
payload['UQ_txt'] = isin

CodePudding user response：

Try:

import requests
from bs4 import BeautifulSoup

isin = "IE00B4L5Y983"
url = "https://www.msci.com/our-solutions/esg-investing/esg-fund-ratings"

headers = {
    "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:102.0) Gecko/20100101 Firefox/102.0",
    "X-Requested-With": "XMLHttpRequest",
}

params = {
    "p_p_id": "esg_fund_ratings_profile",
    "p_p_lifecycle": "2",
    "p_p_state": "normal",
    "p_p_mode": "view",
    "p_p_resource_id": "searchFundRatingsProfiles",
    "p_p_cacheability": "cacheLevelPage",
    "_esg_fund_ratings_profile_keywords": isin,
}

data = requests.get(url, params=params, headers=headers).json()

params = {
    "p_p_id": "esg_fund_ratings_profile",
    "p_p_lifecycle": "2",
    "p_p_state": "normal",
    "p_p_mode": "view",
    "p_p_resource_id": "showEsgFundRatingsProfile",
    "p_p_cacheability": "cacheLevelPage",
    "_esg_fund_ratings_profile_fundShareClassId": data[0]["url"],
}

headers["Referer"] = "https://www.msci.com/our-solutions/esg-investing/esg-fund-ratings/funds/{}/{}".format(
    data[0]["encodedTitle"], data[0]["url"]
)

soup = BeautifulSoup(
    requests.get(url, params=params, headers=headers).content, "html.parser"
)
data = soup.select_one(".ratingdata-fund-rating")["class"]
print(data)

Prints:

['ratingdata-fund-rating', 'esg-fund-ratings-circle-aaa']

CodePudding user response：

When you press enter, you send another request, which already shows the search result. Here is an example of how to get what you want

import requests

isin = 'IE00B4L5Y983'
url = f"https://www.msci.com/our-solutions/esg-investing/esg-fund-ratings?p_p_id=esg_fund_ratings_profile&p_p_lifecycle=2&p_p_state=normal&p_p_mode=view&p_p_resource_id=searchFundRatingsProfiles&p_p_cacheability=cacheLevelPage&_esg_fund_ratings_profile_keywords={isin}"
for title in requests.get(url).json():
    print(title['title'])

OUTPUT:

iShares Core MSCI World UCITS ETF USD (Acc)

CodePudding user response：

If I may: from the OP's description I can only infer this is either an education related test, either a job interview related test. As such, following the exact instructions is paramount. In order to follow said instructions, you can only use selenium. The following code will work 'a la point', and get the desired result:

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from bs4 import BeautifulSoup


chrome_options = Options()
chrome_options.add_argument("--no-sandbox")
chrome_options.add_argument("--headless")


webdriver_service = Service("chromedriver/chromedriver") ## path to where you saved chromedriver binary
browser = webdriver.Chrome(service=webdriver_service, options=chrome_options)

url = 'https://www.msci.com/our-solutions/esg-investing/esg-fund-ratings/funds/'

browser.get(url)
WebDriverWait(browser, 20).until(EC.presence_of_element_located((By.ID, '_esg_fund_ratings_profile_keywords'))).send_keys('IE00B4L5Y983')
WebDriverWait(browser, 20).until(EC.visibility_of_element_located((By.ID, 'ui-id-1')))
result = browser.find_element(By.ID, "ui-id-1")
result.click()
WebDriverWait(browser, 20).until(EC.visibility_of_element_located((By.CLASS_NAME, 'esgratings-profile-header-green')))
result = browser.find_element(By.CLASS_NAME, "esgratings-profile-header-green").find_element(By.TAG_NAME, "div").get_attribute('class')
print(result)
browser.quit()

This will return:

ratingdata-fund-rating esg-fund-ratings-circle-aaa