Home > OS >  How to use Selenium Python to get a field information of each linked page
How to use Selenium Python to get a field information of each linked page


The context is springerlink. For example this series of books enter image description here

So we can get the EISBN codes directly from those urls, without the need to load a new page for each book:

import requests
from bs4 import BeautifulSoup

soup = BeautifulSoup(requests.get(url).text, "html.parser")
titles = [title.text.strip() for title in soup.select('.c-card__title')]
EISBN = []
for a in soup.select('ul:last-child .c-meta__item:last-child a'):
    c = a['href'].split('/')[-1] # a['href'] is something like https://www.springer.com/book/9783031256325
    EISBN.append( f'{c[:3]}-{c[3]}-{c[4:7]}-{c[7:12]}-{c[-1]}' ) # insert four '-' in the number 9783031256325 to create the E-ISBN code


978-3-031-25632-5 Random Walks on Infinite Groups
978-3-031-19707-9 Drinfeld Modules
978-3-031-13379-4 Partial Differential Equations
978-3-031-00943-3 Stationary Processes and Discrete Parameter Markov Processes
978-3-031-14205-5 Measure Theory, Probability, and Stochastic Processes
978-3-030-56694-4 Quaternion Algebras
978-3-030-73839-6 Mathematical Logic
978-3-030-71250-1 Lessons in Enumerative Combinatorics
978-3-030-35118-2 Basic Representation Theory of Algebras
978-3-030-59242-4 Ergodic Dynamics

Method 2 (slower): get E-ISBN by loading a page for each book

This method load the details page for each book and extract from there the EISBN code:

import requests, re
from bs4 import BeautifulSoup

url = 'https://www.springer.com/series/136/books'
soup = BeautifulSoup(requests.get(url).text, "html.parser")
books = soup.select('a[data-track-label^="article"]')
titles, EISBN = [], []

for book in books:
    soup_book = BeautifulSoup(requests.get(book['href']).text, "html.parser")
    EISBN.append( soup_book.select('p:has(span[data-test=electronic_isbn_publication_date]) .c-bibliographic-information__value')[0].text )

for i in range(len(titles)):

If you are wondering p:has(span[data-test=electronic_isbn_publication_date]) select the parent p of the span having attribute data-test=electronic_isbn_publication_date.

  • Related