Home > Blockchain >  How in R can I combine two groups of data and add them each into single columns after parsing?
How in R can I combine two groups of data and add them each into single columns after parsing?

Time:03-11

library(rvest)

link1 <- "https://www.house.kg/en/details/78672316222ed8865fd97-82358847"
link2 <- "https://www.house.kg/en/details/258564561fa0bd0854978-45745933"

house_link <- c(link1, link2)

house_features <- data.frame()
size <- length(house_link)

for (i in 1:size) {
  page_data = read_html(house_link[i])
  
  parameters = page_data %>% html_nodes(".label") %>% html_text(trim = TRUE) 
  values = page_data %>% html_nodes(".info") %>% html_text(trim = TRUE)
  
  house_features = rbind(house_features, data.frame(parameters, values))
  return(house_features) 
}

View(houses)

While one of the links has 19 variables, while the second one contains 5 variables only. You see the discrepancy. How can I make all variables each into individual columns? If it has no value on that variable, say, additional 14 variables, I want to add NA for the value of the variables. How should I accomplish this, peeps?

CodePudding user response:

Try this approach:

  1. Gather the house features in a list
house_features = lapply(house_link, function(link) {
  page_data <- tryCatch(read_html(link),error = function(e) e ,warning=function(w) w)

  if(!inherits(page_data, "error")) {
    data.frame(
      link = link,
      parameters = page_data %>% html_nodes(".label") %>% html_text(trim = TRUE),
      values = page_data %>% html_nodes(".info") %>% html_text(trim = TRUE)
    )
  } else {
    NULL
  }
})
  1. rbind them using do.call, ensure that the parameter names are unique (they are not / for example link1 has two parameters called Floor), and then pivot_wider
do.call(rbind,house_features) %>% 
  group_by(link, parameters) %>%
  mutate(parameters = if_else(row_number()>1, paste(parameters,row_number()), parameters)) %>% 
  pivot_wider(id_cols = link, names_from=parameters,values_from=values)

Output:

  link   `Type of offer` Category House  Floor Area  Condition Internet Toilet Gas   `Front door` Parking Furniture `Floor 2` `Ceiling height` Security Other `Possibility of…
  <chr>  <chr>           <chr>    <chr>  <chr> <chr> <chr>     <chr>    <chr>  <chr> <chr>        <chr>   <chr>     <chr>     <chr>            <chr>    <chr> <chr>           
1 https… from owner      elite    monol… 9 fl… 107 … european… optics   2 bat… trunk armored      parking fully fu… laminate  3 m.             bars on… plas… no              
2 https… from agent      NA       panel… NA    255 … NA        NA       NA     NA    NA           NA      NA        NA        NA               NA       NA    NA              
# … with 4 more variables: Possibility of getting a mortgage <chr>, Possibility of exchange <chr>, Number of floors <chr>, Heating <chr>
  • Related