<dl class="book__details-item">
<dt class="book__details-name">
Место издания:
</dt>
<dt class="book__details-value">
Москва
</dt>
</dl>
<dl class="book__details-item">
<dt class="book__details-name">
Издательство:
</dt>
<dt class="book__details-value">
<a href="/publishers/5558/" target="blank">Манн, Иванов и Фербер</a>
</dt>
</dl>
<dl class="book__details-item">
<dt class="book__details-name">
Год издания:
</dt>
<dt class="book__details-value">
2021
</dt>
</dl>
<dt class="book__details-name">
Год издания:
</dt>
<dt class="book__details-value">
2021
</dt>
Hi. Here i have a bookstore website. I need to get the publishing year out, but i can't get to it, cut every element of books description is put under a similarly named blocks.
def get_html(url, params=None):
r = requests.get(url, headers = HEADERS, params = params)
return r
def get_content(html): # Here's a part, where it gets confusing
years = []
soup = BeautifulSoup(html, "html.parser")
items = soup.find("div", class_="book__details-left")
smalleritems = items.find("dl", class_="book__details-item")
smalleritems = smalleritems.find("dt", class_="book__details-value")
smalleritems = smalleritems.get_text()
print(smalleritems)
def parse(URL):
html = get_html(URL)
if html.status_code == 200:
midlinklist = get_content(html.text)
return midlinklist
else:
print("Error")
for URL in final_linklist:
print (str(URL))
print("Парсинг", page, "страниц из", len(final_linklist) - page)
page = page 1
midl = parse(str(URL))
for pubs in midl:
final_publist.append(pubs)
My code is not finished, because i can't quite get to the text "2021" under
<dt class="book__details-value">
2021
</dt>
CodePudding user response:
You can get the last tag under the class book__details-value
using [-1]
indexing:
soup = BeautifulSoup(html, "html.parser")
print(soup.find_all("dt", class_="book__details-value")[-1].get_text(strip=True))
Output:
2021
CodePudding user response:
I would go by anchoring to the correct preceding dt
by its class and the text it contains i.e. Год издания:
or translated The year of publishing:
. Once matched on that, use an adjacent sibling combinator ( ) to move to the adjacent element which has class book__details-value
:
import requests
from bs4 import BeautifulSoup as bs
r = requests.get('<individual book url>')
soup = bs(r.content, 'lxml')
print(int(soup.select_one('.book__details-name:-soup-contains("Год издания:") .book__details-value').text.strip()))