I am trying to extract the authors name and his affiliation from the webpage (given below in the code). In some cases, the number of authors can be large and there is a button 'Show_all' which we can click to see all the authors name.
driver_max_wait_time = 20
driver.get('https://inspirehep.net/literature?sort=mostrecent&size=25&page=1&q=arXiv:1311.4916')
# Wait for the element.
WebDriverWait(driver, driver_max_wait_time).until(EC.presence_of_element_located((By.CLASS_NAME, 'result-item-title')))
# click the above element.
element = driver.find_element(By.CLASS_NAME,'result-item-title').click()
# Get the name of authors and their affiliations. May be in format a(U), b(U) etc.
# a, b are authors.
# U is some university.
WebDriverWait(driver, driver_max_wait_time).until(EC.presence_of_element_located((By.CLASS_NAME, '__InlineList__')))
auth_and_aff_text = driver.find_element(By.CLASS_NAME, '__InlineList__').text
if 'Show All' in auth_and_aff_text:
print(' Do somehting special')
WebDriverWait(driver, driver_max_wait_time).until(EC.element_to_be_clickable((By.CLASS_NAME, '__SecondaryButton__'))).click()
#Now we have clicked the show_all button.
As can be seen in the snippet that show_all
button was clicked. Can someone tell me how can I extract authors name from this small window/popup.
As was asked by someone, I am further editing question to include screen shots.
The first driver.get
command i.e. https://inspirehep.net/literature?sort=mostrecent&size=25&page=1&q=arXiv:1311.4916
leads to the following page.
enter image description here
Now, we click the paper title which leads to the following page. enter image description here
Now we click the show_all button which open the small window. enter image description here
This is where I am stuck. How to extract information from this window/popup?
CodePudding user response:
names=[x.text for x in driver.find_elements(By.XPATH,"//div[@class='ant-modal-body']//a[@data-test-id]")]
Should grab all 9 names in that popup without the brackets if that's what you want.
or for with brackets
//div[@class='ant-modal-content']//div[@class='di']
CodePudding user response:
To collect data from hover over pop up window, firstly, you need to perform hover over using ActionChains
then click.
Script:
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
option = webdriver.ChromeOptions()
option.add_argument("start-maximized")
#chrome to stay open
option.add_experimental_option("detach", True)
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()),options=option)
driver.get("https://inspirehep.net/literature?sort=mostrecent&size=25&page=1&q=arXiv:1311.4916")
wait = WebDriverWait(driver, 30)
# click the above element.
element = driver.find_element(By.CLASS_NAME,'result-item-title').click()
ActionChains(driver).move_to_element(wait.until(EC.element_to_be_clickable((By.CLASS_NAME, '__SecondaryButton__')))).perform()
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CLASS_NAME, '__SecondaryButton__'))).click()
print(wait.until(EC.presence_of_element_located((By.XPATH, '//*[@]/div/ul'))).get_attribute('innerText'))
Output:
A.N. Cooke(Edinburgh U.), R. Horsley(Edinburgh U.), Y. Nakamura(RIKEN AICS, Kobe), D. Pleiter(Julich, Forschungszentrum and Regensburg U.), P.E.L. Rakow(Liverpool U., Dept. Math.), P. Shanahan(Adelaide U., Sch. Chem. Phys.), G. Schierholz(DESY), H. Stüben(U. Hamburg (main)), J.M. Zanotti(Adelaide U., Sch. Chem. Phys.)