Home > Enterprise >  How can I implement this IF condition?
How can I implement this IF condition?

Time:11-22

I am using R to scrape a Table from a webpage. My issue is that one of table columns on the webpage randomly changes its header name between these 2 names: "Accommodation Type" and " Room type".

So, I need to manually check the webpage and correct my R codes accordingly if I don't want my R script to break when it reaches that particular line of code.

Here is the code (when that column title is "Accommodation type"):

output31 <- output3 %>% mutate_at(c("Accommodation Type", "Price for 1 week"), ~str_extract(., ".*\n"))

And here is the code (when that column title is "Room type"):

 output31 <- output3 %>% mutate_at(c("Room type", "Price for 1 week"), ~str_extract(., ".*\n"))

Is there a way to insert a sort of IF condition with the logic that if it finds "Accommodation Type" as the column title, then it has to ignore that line of code where "Room type" is mentioned and vice versa?

Edit (full code added):

library(rvest)
library(dplyr)
library(stringr)
if (!require(tables)) install.packages('tables')
library(tables)
library(xlsx)

url3 <- read_html("https://www.booking.com/hotel/mu/lux-grand-baie-resort-amp-residences.en-gb.html?aid=356980&label=gog235jc-1DCAsonQFCE2hlcml0YWdlLWF3YWxpLWdvbGZIM1gDaJ0BiAEBmAExuAEXyAEM2AED6AEB-AECiAIBqAIDuAKiwqmEBsACAdICJGFkMTQ3OGU4LTUwZDMtNGQ5ZS1hYzAxLTc0OTIyYTRiZDIxM9gCBOACAQ&sid=729aafddc363c28a2c2c7379d7685d87&all_sr_blocks=36363601_246990918_2_85_0&checkin=2021-12-08&checkout=2021-12-15&dest_id=-1354779&dest_type=city&dist=0&from_beach_key_ufi_sr=1&group_adults=2&group_children=0&hapos=1&highlighted_blocks=36363601_246990918_2_85_0&hp_group_set=0&hpos=1&no_rooms=1&sb_price_type=total&sr_order=popularity&sr_pri_blocks=36363601_246990918_2_85_0__29200&srepoch=1619681695&srpvid=51c8354f03be0097&type=total&ucfs=1&req_children=0&req_adults=2&hp_refreshed_with_new_dates=1")

output3 <- url3 %>% 
  html_nodes(xpath = './/table[@id="hprt-table"]')  %>%
  html_table() %>% .[[1]]

output31 <- output3 %>% mutate_at(c("Accommodation Type", "Price for 1 week"), ~str_extract(., ".*\n"))

CodePudding user response:

You could create a function to take care of the cleanup or use the logic inside the function in your script.

The function checks if the "Room type" is in the colnames. If yes clean with the first line of code, if not, clean up with the else part.

clean_up <- function(data){
  if("Room type" %in% colnames(data)){
    out <- data %>% 
      mutate_at(c("Room type", "Price for 1 week"), ~str_extract(., ".*\n"))
  } else {
    out <- data %>% 
      mutate_at(c("Accommodation Type", "Price for 1 week"), ~str_extract(., ".*\n"))
  }
  
  out
}

output31 <- clean_up(output3)

Note that you might be in possible violation of the TOS of booking.com

  • Related