I am struggling to extract data from a website with RSelenium
and would be grateful for any hint.
The site's address is "https://www.airbank.cz/mapa-pobocek-a-bankomatu/brno-netroufalky-c-p-770/"
. The data/link i want to extract is:
<a href="https://www.google.com/maps/dir//50.659742, 14.039068/@50.659742,14.039068,16z/">
Below my code:
library(RSelenium)
library(tidyverse)
rD <- rsDriver(browser="firefox", port=483L, verbose=F)
remDr <- rD[["client"]]
x <- "https://www.airbank.cz/mapa-pobocek-a-bankomatu/brno-netroufalky-c-p-770/"
remDr$navigate(x)
Sys.sleep(5) # give the page time to fully load
current_url <- remDr$getCurrentUrl()
current_url
remDr$getStatus()
page_source <- remDr$getPageSource()[[1]]
class(page_source)
Sys.sleep(5) # give the page time to fully load
link_google <- page_source %>%
xml2::read_html() %>%
rvest::html_elements("a") %>%
rvest::html_attr("href")
str_subset(link_google, "dir")
character(0)
I am not sure why don't get the desired result (but other links). My suspicions is that it is related to the presence of an iframe, but I couldn't really figure it out.
When checking the raw result of page_source <- remDr$getPageSource()[[1]]
I actually can't find the link in question. However, when inspecting the site in my browser, the link is present.
CodePudding user response:
To extract the href
attribute i.e. https://www.google.com/maps/dir//50.659742, 14.039068/@50.659742,14.039068,16z/
you can use the getElementAttribute
method and you can use either of the following locator strategies:
Using css selector:
element <- remDr$findElement(using = "css selector", "a.flex.items-center[href]") element$getElementAttribute("href")
Using xpath:
element <- remDr$findElement(using = "xpath", "//a[@class='flexitems-center' and @href]") element$getElementAttribute("href")