I'm trying to scrape the gun laws from https://www.statefirearmlaws.org/. However, I keep getting the following error:
Error in df1[[1]] : subscript out of bounds
I used selector gadget to copy the nodes for the table.
What can I do to fix it?
library(rvest)
library(tidyverse)
years <- lapply(c(2006:2018), function(x) {
link <- paste0('https://www.statefirearmlaws.org/national-data/', x)
df1 <- link %>% read_html() %>%
html_nodes('.js-view-dom-id-cc833ef0290cd127457401b760770f1411daa41fc70df5f12d07744fab0a173c > div > div') %>%
html_text(trim = TRUE)
df <- df1[[1]]
return(df)
}
)
CodePudding user response:
df1 <- link %>% read_html() %>%
html_nodes('.js-view-dom-id-cc833ef0290cd127457401b760770f1411daa41fc70df5f12d07744fab0a173c > div > div')
this part results in {xml_nodeset (0)}
which later produces empty list()
.
Are you selecting the correct thing you want to scrape in html_nodes
? Maybe SelectorGadget can be helpful to choose what you need
CodePudding user response:
So html_text
expects a node as it's input and html_table
outputs a list of tibbles so html_text
fails to parse this.