Home > Mobile >  Web scraping of nested links with R
Web scraping of nested links with R

Time:04-10

I would like to web scrap the links that are nested in the name of the property, this script works, however, not retrieves the URLs only NAs. Could you help me or what I am missing in the script snipped.

Thank you

# Test
library(rvest)
library(dplyr)

link <- "https://www.sreality.cz/hledani/prodej/byty/brno?_escaped_fragment_="
page <- read_html(link)

price <- page %>% 
  html_elements(".norm-price.ng-binding") %>% 
  html_text()

name <- page %>% 
  html_elements(".name.ng-binding") %>% 
  html_text()

location <- page %>% 
  html_elements(".locality.ng-binding") %>% 
  html_text()

href <- page %>% 
  html_nodes(".name.ng-binding") %>% 
  html_attr("href") %>% paste("https://www.sreality.cz", ., sep="")

flat <- data.frame(price, name, location, href, stringsAsFactors = FALSE)

CodePudding user response:

Your CSS selector picked the anchors' inline html instead of the anchor. This should work:

 page %>% 
     html_nodes("a.title") %>%
     html_attr("ng-href") %>% 
     paste0("https://www.sreality.cz", .)

paste0(...) being a shorthand for paste(..., sep = '')

CodePudding user response:

Another way using JS path

page %>% 
  html_nodes('#page-layout > div.content-cover > div.content-inner > div.transcluded-content.ng-scope > div > div > div.content > div > div:nth-child(4) > div > div:nth-child(n)') %>% 
  html_nodes('a') %>% html_attr('href') %>% str_subset('detail') %>% unique() %>% paste("https://www.sreality.cz", ., sep="")

[1] "https://www.sreality.cz/detail/prodej/byt/4 1/brno-zabrdovice-tkalcovska/1857071452"          
 [2] "https://www.sreality.cz/detail/prodej/byt/3 kk/brno--/1336764508"                             
 [3] "https://www.sreality.cz/detail/prodej/byt/2 kk/brno-stary-liskovec-u-posty/3639359836"        
 [4] "https://www.sreality.cz/detail/prodej/byt/2 1/brno-reckovice-druzstevni/3845994844"           
 [5] "https://www.sreality.cz/detail/prodej/byt/2 1/brno-styrice-jilova/1102981468"                 
 [6] "https://www.sreality.cz/detail/prodej/byt/1 kk/brno-dolni-herspice-/1961502812"
  • Related