a function that exhibits wrong values only when a specific value for an argument is inserted-CodePudding

I'm working on a data set that shows mortality rate for certain diseases and other info in hospitals in various states, and here it is. https://drive.google.com/open?id=1FTZJQLdw0PKw2bQ7XvxWnOITU7-yOCXC

I'm trying to write a function called rankall() that takes TWO (2) arguments: (a) the disease (output) which might be one of three: heart attack, heart failure, pneumonia; and (b) a hospital ranking (num). The function reads the dataset and returns a TWO(2)-column data frame containing the hospital in EACH state that has the ranking specified in num. For example the function call

rankall(“heart attack”, “best”)

would return a data frame containing the names of the hospitals that are the best in their respective states for THIRTY(30)-day heart attack death rates. The function should return a value for EVERY state (some may be NA). The FIRST (1st) column in the data frame is named hospital, which contains the hospital name, and the SECOND (2nd) column is named state, which contains the TWO(2)-character abbreviation for the state name. The function should use the following template.

i've written the function and it works perfectly fine if the output argument is heart attack or heart failure, but when the output is pneumonia it gives wrong values. and here is my code:

 rankall <- function(outcome, num = "best"){
  outcome1 <- read.csv("outcome-of-care-measures.csv")
  if (!outcome %in% c("heart attack", "heart failure", "pneumonia")){
    stop("invalid outcome")
  }
  if (outcome == "heart attack"){
    column <- 11
  }
  else if (outcome == "heart failure"){
    column <- 17
  }
  else{
    column <- 23
  }
  vec <- unique(outcome1[,7])
  x <- vector()
  for (i in vec){
    outcome2 <- subset(outcome1, State == i)
    outcome2[,column] <- as.numeric(outcome2[,column])
    outcome3 <- outcome2[order(outcome2[,column], outcome2[,2]),]
    outcome3 <- outcome3[(!is.na(outcome3[,column])),]
    if (num == "best"){
      num <- 1
      }
    else if (num == "worst"){
      num <- nrow(outcome3)
    }
    ans <- outcome3[num,2]
    x <- c(x, ans)
  }
  df <- data.frame(hospitals =x, state = vec)
  final <- df[order(df[,2]),]
  final
}

CodePudding user response：

I think you are trying to get the ranking of hospitals for these outcome, by state. I cannot replicate the problems you are seeing. I also don't think your function is working for the non-pneu outcomes the way you really want it to work. You can wrap this in a function if necessary, although there are even simpler ways to do it. If you do want a function with the structure you have above, here are a couple of ways to do it (Unless necessary, I don't think you need/want to have the read calls inside the function, although I've left them in, as you have).

library(data.table)
rankall <- function(outcome, num=c("best", "worst")) {
  num=match.arg(num)
  outc = list(
    "heart attack" = "Hospital 30-Day Death (Mortality) Rates from Heart Attack",
    "heart failure" = "Hospital 30-Day Death (Mortality) Rates from Heart Failure",
    "pneumonia" = "Hospital 30-Day Death (Mortality) Rates from Pneumonia")[[outcome]]
  dat = fread("outcome.csv")[, out:=as.numeric(get(outc))][!is.na(out)]
  
  if(num=="worst") dat[,out:=-1*out]
  dat[order(out), .SD[1], by=State, .SDcols=c("Hospital Name",outc)][order(State)]
}

If you prefer using tidyverse

library(tidyverse)
rankall <- function(outcome, num=c("best", "worst")) {
  num=match.arg(num)
  outc = list(
    "heart attack" = "Hospital 30-Day Death (Mortality) Rates from Heart Attack",
    "heart failure" = "Hospital 30-Day Death (Mortality) Rates from Heart Failure",
    "pneumonia" = "Hospital 30-Day Death (Mortality) Rates from Pneumonia")[[outcome]]
  dat = read_csv("outcome.csv",progress = F,show_col_types = F) %>% 
    mutate(out = as.numeric(get(outc))) %>% 
    filter(!is.na(out))
  
  if(num=="worst") dat <- dat %>% mutate(out = out*-1)
  dat %>% arrange(out) %>% group_by(State) %>% slice_min(out) %>% select(State,`Hospital Name`)
}

CodePudding user response：

The following function seems to work.

Note that the function has an extra argument, X, the data set. It uses split instead of unique to divide the data by states and then applies a lapply loop to this list.

rankall <- function(X, outcome, num = "best"){
  if (!outcome %in% c("heart attack", "heart failure", "pneumonia")){
    stop("invalid outcome")
  }
  if (outcome == "heart attack"){
    column <- 11
  }
  else if (outcome == "heart failure"){
    column <- 17
  }
  else{
    column <- 23
  }
  
  i_FUN <- (1:2)[1L   (num == "worst")]
  FUN <- list(which.min, which.max)[[i_FUN]]
  sp <- split(X[c(2, column)], X[["State"]])
  
  out <- lapply(seq_along(sp), function(i){
    State <- names(sp)[i]
    if(nrow(sp[[i]]) == 0L) {
      data.frame(hospitals = NA_character_, state = State)
    } else {
      Rank <- suppressWarnings(as.numeric(sp[[i]][[2]]))
      if(all(is.na(Rank))) {
        data.frame(hospitals = NA_character_, state = State)
      } else {
        r <- FUN(Rank)
        hospital <- sp[[i]][r, "Hospital.Name", drop = TRUE]
        data.frame(hospitals = hospital, state = State)
      }
    }
  })
  final <- do.call(rbind, out)
  row.names(final) <- NULL
  final
}