Home > Back-end >  RSelenium: Issue with extracting link from website
RSelenium: Issue with extracting link from website

Time:07-22

I am struggling to extract data from a website with RSelenium and would be grateful for any hint.

The site's address is "https://www.airbank.cz/mapa-pobocek-a-bankomatu/brno-netroufalky-c-p-770/". The data/link i want to extract is:

<a  href="https://www.google.com/maps/dir//50.659742, 14.039068/@50.659742,14.039068,16z/">

Below my code:

library(RSelenium)
library(tidyverse)

rD <- rsDriver(browser="firefox", port=483L, verbose=F)
remDr <- rD[["client"]]
x <- "https://www.airbank.cz/mapa-pobocek-a-bankomatu/brno-netroufalky-c-p-770/"

remDr$navigate(x)
Sys.sleep(5) # give the page time to fully load
current_url <- remDr$getCurrentUrl()
current_url
remDr$getStatus()
page_source <- remDr$getPageSource()[[1]]
class(page_source)
Sys.sleep(5) # give the page time to fully load

link_google <- page_source %>%
xml2::read_html() %>%
rvest::html_elements("a") %>%
rvest::html_attr("href")

str_subset(link_google, "dir")
character(0)

I am not sure why don't get the desired result (but other links). My suspicions is that it is related to the presence of an iframe, but I couldn't really figure it out.

When checking the raw result of page_source <- remDr$getPageSource()[[1]] I actually can't find the link in question. However, when inspecting the site in my browser, the link is present.

CodePudding user response:

To extract the href attribute i.e. https://www.google.com/maps/dir//50.659742, 14.039068/@50.659742,14.039068,16z/ you can use the getElementAttribute method and you can use either of the following locator strategies:

  • Using css selector:

    element <- remDr$findElement(using = "css selector", "a.flex.items-center[href]")
    element$getElementAttribute("href")
    
  • Using xpath:

    element <- remDr$findElement(using = "xpath", "//a[@class='flexitems-center' and @href]")
    element$getElementAttribute("href")
    

Reference

RSelenium

  • Related