library(rvest)
link1 <- "https://www.house.kg/en/details/78672316222ed8865fd97-82358847"
link2 <- "https://www.house.kg/en/details/258564561fa0bd0854978-45745933"
house_link <- c(link1, link2)
house_features <- data.frame()
size <- length(house_link)
for (i in 1:size) {
page_data = read_html(house_link[i])
parameters = page_data %>% html_nodes(".label") %>% html_text(trim = TRUE)
values = page_data %>% html_nodes(".info") %>% html_text(trim = TRUE)
house_features = rbind(house_features, data.frame(parameters, values))
return(house_features)
}
View(houses)
While one of the links has 19 variables, while the second one contains 5 variables only. You see the discrepancy. How can I make all variables each into individual columns? If it has no value on that variable, say, additional 14 variables, I want to add NA for the value of the variables. How should I accomplish this, peeps?
CodePudding user response:
Try this approach:
- Gather the house features in a list
house_features = lapply(house_link, function(link) {
page_data <- tryCatch(read_html(link),error = function(e) e ,warning=function(w) w)
if(!inherits(page_data, "error")) {
data.frame(
link = link,
parameters = page_data %>% html_nodes(".label") %>% html_text(trim = TRUE),
values = page_data %>% html_nodes(".info") %>% html_text(trim = TRUE)
)
} else {
NULL
}
})
rbind
them usingdo.call
, ensure that the parameter names are unique (they are not / for example link1 has two parameters calledFloor
), and thenpivot_wider
do.call(rbind,house_features) %>%
group_by(link, parameters) %>%
mutate(parameters = if_else(row_number()>1, paste(parameters,row_number()), parameters)) %>%
pivot_wider(id_cols = link, names_from=parameters,values_from=values)
Output:
link `Type of offer` Category House Floor Area Condition Internet Toilet Gas `Front door` Parking Furniture `Floor 2` `Ceiling height` Security Other `Possibility of…
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 https… from owner elite monol… 9 fl… 107 … european… optics 2 bat… trunk armored parking fully fu… laminate 3 m. bars on… plas… no
2 https… from agent NA panel… NA 255 … NA NA NA NA NA NA NA NA NA NA NA NA
# … with 4 more variables: Possibility of getting a mortgage <chr>, Possibility of exchange <chr>, Number of floors <chr>, Heating <chr>