How to loop through different pages in rselenium with links that have different endings-CodePudding

I'm trying to scrape the unemployment rate tables for 2017-2021. but before I scrape the tables, I want to first figure out how to navigate to each page. This is what I have so far

library(RSelenium)
library(rvest)
library(tidyverse)
library(netstat)

# start server
remote_driver <- rsDriver(browser = 'chrome',
                          chromever = '99.0.4844.51',
                          verbose = F,
                          port = free_port())
#create client object
rd <- remote_driver$client

# open browser
rd$open()

# maximize window
rd$maxWindowSize()

# navigate to page
rd$navigate('https://www.bls.gov/lau/tables.htm')

years <- c(2017:2021)

for (i in years) {
  rd$findElement(using = 'link text', years)$clickElement()
  Sys.sleep(3)
  rd$goBack
  
}

but it gives the error

Selenium message:java.util.ArrayList cannot be cast to java.lang.String

Error:   Summary: UnknownError
     Detail: An unknown server-side error occurred while processing the command.
     Further Details: run errorDetails method

I was originally going to use rvest, but I couldn't figure out how to make a page sequence since the links all end with .htm. Not only that but the main link is /tables and the other links are /lastrk. It just seems easier to stick with selenium.

so, any suggestions?

CodePudding user response：

Get the tables for Unemployment rates for metropolitan areas for year 2016 to 2020.

The links follow similar pattern, thus can be produced by us.

library(rvest)
library(dplyr)
df = lapply(c(16:20), function(x) {
  
  link = paste0('https://www.bls.gov/lau/lamtrk', x, '.htm')
  
  df1 =link %>%  read_html() %>% html_nodes('.regular') %>% 
    html_table()
  df = df1[[1]]
  return(df)
}
)