Home > Back-end >  R - problem with web scraping nauka-polska.pl
R - problem with web scraping nauka-polska.pl

Time:12-18

I tried to web scraped this page -> https://nauka-polska.pl/#/home/search?lang=en&_k=ub2fy9 and receive table with publications about Big data. The main problem is with site with the result (e.g https://nauka-polska.pl/#/results?_k=7enpzq), because if you activate the link it will get you to the main site, so my code gets no results.

I tried Rvest. Maybe You will have some idea how to avoid this problem or it is impossible to web scrap this page?

CodePudding user response:

Take a look at https://httr2.r-lib.org/

library(tidyverse)
library(httr2)

get_publications <- function(page) {
  "https://nauka-polska.pl/nowanauka-server//search" %>%
    request() %>%
    req_body_json(list(
      offset = page / 10 - 1,
      query = "big data",
      sorting = 0
    )) %>%
    req_perform() %>%
    resp_body_json(simplifyVector = TRUE) %>%
    pluck("results") %>%
    as_tibble()
}

# Get publications from the first page 
get_publications(1)

# Get publications from pages 1 to 10
map_dfr(1:10, get_publications)

# A tibble: 100 × 7
        id objectType namePl                      nameEn descr…¹ descr…² score
     <int> <chr>      <chr>                       <chr>  <chr>   <chr>   <dbl>
 1 6246234 Publikacje "<mark>Big</mark> <mark>Da… "<mar… "<b>Au… "<b>Au…  6.63
 2 6299516 Publikacje "Temporal Aspects of <mark… "Temp… "<b>Au… "<b>Au…  6.25
 3 6310619 Publikacje "Bazy danych. <mark>Big</m… "Bazy… "<b>Ro… "<b>Ye…  6.11
 4 6293319 Publikacje "Cities in the age of the … "Citi… "<b>Au… "<b>Au…  5.81
 5 6296475 Publikacje "<mark>Big</mark> <mark>Da… "<mar… "<b>Ro… "<b>Ye…  5.81
 6 6297230 Publikacje "<mark>Big</mark> <mark>Da… "<mar… "<b>Ro… "<b>Ye…  5.81
 7 6299492 Publikacje "Analiza i strategia <mark… "Anal… "<b>Au… "<b>Au…  5.52
 8 6204301 Publikacje "<mark>Big</mark> <mark>da… "<mar… "<b>Au… "<b>Au…  5.52
 9 6165464 Publikacje "<mark>Big</mark> <mark>Da… "<mar… "<b>Au… "<b>Au…  5.52
10 6351819 Publikacje "Financial Management in t… "Fina… "<b>Au… "<b>Au…  5.52
# … with 90 more rows, and abbreviated variable names ¹​descriptionPl,
#   ²​descriptionEn
# ℹ Use `print(n = ...)` to see more rows
  • Related