I need to scrape “manuscript received date” that is visible in the right-hand frame, once you click “Information” at this page: https://onlinelibrary.wiley.com/doi/10.1002/jcc.26717 . I tried to use an rvest script listed below, that worked fine in similar situations. However, it does not work in this case, perhaps because of the click required to get to the publication history. I tried solving this issue by adding #pane-pcw-details to the url (https://onlinelibrary.wiley.com/doi/10.1002/jcc.26717#pane-pcw-details) but to no avail. Another option would be to use RSelenium, but perhaps there is a simpler workaround?
library(rvest)
link <-c("https://onlinelibrary.wiley.com/doi/10.1002/jcc.26717#pane-pcw-details")
wiley_output <-data.frame()
page = read_html(link)
revhist = page %>% html_node(".publication-history li:nth-child(5)") %>% html_text()
wiley_output = rbind(wiley_output, data.frame(link, revhist, stringsAsFactors = FALSE))
CodePudding user response:
That data comes from an ajax call you can find in the network tab. It has a lot of querystring params but you actually only need the identifier for the document, along with ajax = True
to ensure return of data associated with the specified ajax action:
https://onlinelibrary.wiley.com/action/ajaxShowPubInfo?ajax=true&doi=10.1002/jcc.26717
library(rvest)
library(magrittr)
link <- 'https://onlinelibrary.wiley.com/action/ajaxShowPubInfo?ajax=true&doi=10.1002/jcc.26717'
page <- read_html(link)
page %>% html_node(".publication-history li:nth-child(5)") %>% html_text()