I tried to web scraped this page -> https://nauka-polska.pl/#/home/search?lang=en&_k=ub2fy9 and receive table with publications about Big data. The main problem is with site with the result (e.g https://nauka-polska.pl/#/results?_k=7enpzq), because if you activate the link it will get you to the main site, so my code gets no results.
I tried Rvest. Maybe You will have some idea how to avoid this problem or it is impossible to web scrap this page?
CodePudding user response:
Take a look at https://httr2.r-lib.org/
library(tidyverse)
library(httr2)
get_publications <- function(page) {
"https://nauka-polska.pl/nowanauka-server//search" %>%
request() %>%
req_body_json(list(
offset = page / 10 - 1,
query = "big data",
sorting = 0
)) %>%
req_perform() %>%
resp_body_json(simplifyVector = TRUE) %>%
pluck("results") %>%
as_tibble()
}
# Get publications from the first page
get_publications(1)
# Get publications from pages 1 to 10
map_dfr(1:10, get_publications)
# A tibble: 100 × 7
id objectType namePl nameEn descr…¹ descr…² score
<int> <chr> <chr> <chr> <chr> <chr> <dbl>
1 6246234 Publikacje "<mark>Big</mark> <mark>Da… "<mar… "<b>Au… "<b>Au… 6.63
2 6299516 Publikacje "Temporal Aspects of <mark… "Temp… "<b>Au… "<b>Au… 6.25
3 6310619 Publikacje "Bazy danych. <mark>Big</m… "Bazy… "<b>Ro… "<b>Ye… 6.11
4 6293319 Publikacje "Cities in the age of the … "Citi… "<b>Au… "<b>Au… 5.81
5 6296475 Publikacje "<mark>Big</mark> <mark>Da… "<mar… "<b>Ro… "<b>Ye… 5.81
6 6297230 Publikacje "<mark>Big</mark> <mark>Da… "<mar… "<b>Ro… "<b>Ye… 5.81
7 6299492 Publikacje "Analiza i strategia <mark… "Anal… "<b>Au… "<b>Au… 5.52
8 6204301 Publikacje "<mark>Big</mark> <mark>da… "<mar… "<b>Au… "<b>Au… 5.52
9 6165464 Publikacje "<mark>Big</mark> <mark>Da… "<mar… "<b>Au… "<b>Au… 5.52
10 6351819 Publikacje "Financial Management in t… "Fina… "<b>Au… "<b>Au… 5.52
# … with 90 more rows, and abbreviated variable names ¹descriptionPl,
# ²descriptionEn
# ℹ Use `print(n = ...)` to see more rows