python bs4, how to scrape this text in html?-CodePudding

the site url:https://n.news.naver.com/mnews/article/421/0006111920

I want to scrape "5" on the below html.

I used this code: soup.select_one('span.u_likeit_text._count').get_text()

the result is '추천'

html code

<span >5</span>

CodePudding user response：

Main issue here that the count is dynamically generated by JavaScript and not present in response and so your soup.

You could use selenium to render the page like a browser will do and convert the driver.page_source to your BeautifulSoup object:

from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
from bs4 import BeautifulSoup
import time

driver = webdriver.Chrome(ChromeDriverManager().install())

driver.get("https://n.news.naver.com/mnews/article/421/0006111920")
time.sleep(3)

soup = BeautifulSoup(driver.page_source, 'html.parser')

soup.select_one('span.u_likeit_text._count').get_text()

Output:

CodePudding user response：

You have to separate the classes using space, instead of connecting over dot.

from bs4 import BeautifulSoup

soup = BeautifulSoup("<span class='u_likeit_text _count num'>5</span>", 'html.parser')
print(soup)
seven_day = soup.find_all("span" , class_="u_likeit_text _count num")
print(seven_day[0].text)