I would like to get also links to the properties - but for some reason, I am not getting all the links from each page, this code works but only for the first page. What I am missing regarding the link
extraction?
# To get $rooms, $m2, $price, $link
library(rvest)
library(dplyr)
flat_I = data.frame()
for (i in 7:100) {
link <- paste0("https://www.immobilienscout24.at/regional/wien/wien/immobilie-kaufen/seite-", i)
page <- read_html(link)
#parse out the parent nodes
results <- page %>% html_elements(".DHILY")
#retrieve the rooms, m2 and price from each parent
rooms <- results %>% html_element(".ufaLY:nth-child(1)") %>%
html_text()
m2 <- results %>% html_element(".ufaLY:nth-child(2)") %>%
html_text()
price <- results %>% html_element(".tSnnN") %>%
html_text()
link <- page %>%
html_nodes("a._aOSG") %>%
html_attr("href") %>%
paste0("https://www.immobilienscout24.at", ., sep="")
flat_I = rbind(flat_I, data.frame(rooms, m2, price, link, stringsAsFactors = FALSE))
print(paste("Page:", i))
}
CodePudding user response:
The links are located in two classes s5PQF
and YXjuW
we can extract links fro them individually or get all the links from page and filter them to retain only desired links.
Further you have defined link
twice in your loop avoid such repetitions.
library(stringr)
page %>% html_nodes('a') %>%
html_attr('href') %>% unique() %>%
str_subset('expose')