I have a vast list of chemicals for that I need to extract the CAS number. I have written a for
loop which works as intended. However, when a chemical name is not found on the website, my code obviously stops.
Is there a way to account for this in the for loop? So that when a search query is not found, the loop goes back to the start page and searches for the next item in the list?
Down below is my code for the for
loop with a short list of names to search for:
library(RSelenium)
library(netstat)
# start the server
rs_driver_object <- rsDriver(browser = "firefox",
verbose = FALSE,
port = 4847L) # change number if port is not open
# create a client object
remDrCh <- rs_driver_object$client
items <- c("MCPA", "DEET", "apple")
numbers <- list()
for (i in items) {
Sys.sleep(2)
remDrCh$navigate("https://commonchemistry.cas.org/")
search_box <- remDrCh$findElement(using = 'class', 'search-input')
search_box$sendKeysToElement(list(paste(i), key = 'enter'))
Sys.sleep(2)
result <- remDrCh$findElement(using = "class", "result-content")
result$clickElement()
Sys.sleep(2)
cas <- remDrCh$findElements(using = 'class', 'cas-registry-number')
cas_n <- lapply(cas, function (x) x$getElementText())
numbers[[i]] <- unlist(cas_n)
Sys.sleep(2)
remDrCh$navigate("https://commonchemistry.cas.org/")
Sys.sleep(2)
}
The problem lies in the result <- remDrCh$findElement(using = "class", "result-content")
part. For "apple" there is no result, and thus no element that R could use.
I tried to write a separate if else
argument for that specific part, but to no avail.
This still only works for queries that yield a result. I also tried to use findElements
but this only helps for the case when no result is found.
result <- remDrCh$findElement(using = "class", "result-content")
if (length(result) > 0) {
result$clickElement()
} else {
remDrCh$navigate("https://commonchemistry.cas.org/")
}
I also tried to use this How to check if an object is visible in a webpage by using its xpath? but I cannot get it to work on my example.
Any help would be much appreciated!
CodePudding user response:
This should work
items <- c("MCPA", "apple", "DEET")
numbers <- list()
for (i in items) {
Sys.sleep(2)
remDrCh$navigate("https://commonchemistry.cas.org/")
search_box <- remDrCh$findElement(using = 'class', 'search-input')
search_box$sendKeysToElement(list(paste(i), key = 'enter'))
Sys.sleep(2)
result <- try(remDrCh$findElement(using = "class", "result-content"))
if(!inherits(result, "try-error")){
result$clickElement()
Sys.sleep(2)
cas <- remDrCh$findElements(using = 'class', 'cas-registry-number')
cas_n <- lapply(cas, function (x) x$getElementText())
numbers[[i]] <- unlist(cas_n)
}else{
numbers[[i]] <- NA
}
Sys.sleep(2)
remDrCh$navigate("https://commonchemistry.cas.org/")
Sys.sleep(2)
}
Note the try()
wrapper around the problematic code:
result <- try(remDrCh$findElement(using = "class", "result-content"))
This will capture the error if there is one, but allow the loop to continue. Then, there is an if
statement that tries to find the result if the output from try
is not of class "try-error"
otherwise, it returns the number as NA
.