Home > Back-end >  web scraping with a button "show more"
web scraping with a button "show more"

Time:03-30

I need to extract articles from this website including title, date and URL. https://en.news-front.info/category/ukraine-2/

I'm using the rvest package but I'm having difficulty extracting them due to the presence of the "show more" button that loads the other articles. How do I go about doing this? I need the articles through March 2021.

Thank you

CodePudding user response:

try before harvesting data:

webElem <- remDr$findElement(using = 'css selector', ".btn-load-more")
webElem$clickElement()

after this chunk (not sure if neccessary, unfortunetaly you didn't provide any code)

Sys.sleep(2)
YOUR HARVESTING CODE

CodePudding user response:

this is the correct solution for extracting articles with the button "show more"

library(RSelenium)
rD1 <- rsDriver(browser = "chrome", port = 4567L, geckover = NULL, 
            chromever =  "99.0.4844.51", iedrver = NULL, 
            phantomver = NULL)
remDr1 <- rD1[["client"]] 
remDr1$navigate("https://en.news-front.info/category/ukraine-2/")

webElem <- remDr1$findElement(using = 'css selector', ".btn-load-more")
webElem$clickElement()

replicate(50,
      {
        # find button
        morereviews <- remDr1$findElement(using = 'css selector', ".btn-load-more")
        # click button
        morereviews$clickElement()
        # wait
        Sys.sleep(2)
      })

# Scrap the reviews
title <- xml2::read_html(remDr1$getPageSource()[[1]])%>%
rvest::html_nodes(".article-link__title") %>%
rvest::html_text() %>%
dplyr::data_frame(title = .)
title
  • Related