Home > other >  filter(!is.na(column)) is not removing NA's from column in R
filter(!is.na(column)) is not removing NA's from column in R

Time:02-01

I'm redoing a study assignment to see if I can improve it and get back into it. The assignment is to write a function that, given 2 variables "state" and "outcome" returns the name of the hospital in the state that has the lowest deathrate for the given outcome/disease. For some reason my line with filter(!is.na()) does not seem to work. I have a feeling it has to do with the fact that I use paste to select the column name, but In my stests this doesn't seem to matter.

Here is the code:

library(dplyr)
data <- read.csv("outcome-of-care-measures.csv", colClasses = "character")

dataSelected <- data %>%
        select("Hospital.Name", "State", "Hospital.30.Day.Death..Mortality..Rates.from.Heart.Attack", "Hospital.30.Day.Death..Mortality..Rates.from.Heart.Failure", "Hospital.30.Day.Death..Mortality..Rates.from.Pneumonia")

colnames(dataSelected) <- c("HostpitalName","State","DeathRateHeartAttack","DeathRateHeartFailure","DeathRatePneumonia")

dataSelected[,3] <- as.numeric(dataSelected[,3])
dataSelected[,4] <- as.numeric(dataSelected[,4])
dataSelected[,5] <- as.numeric(dataSelected[,5])


best <- function(state,outcome){
        column <- paste('DeathRate',outcome, sep = "")
        if (state %in% dataSelected$State < 1){
                return('Invalid state')
        } else if (column %in% colnames(dataSelected) < 1){
                return('Invalid outcome')
        } else{
        BestHospitals <- dataSelected %>%
                select(HostpitalName,State,column) %>%
                filter(!is.na(column)) %>%
                filter(State == state) %>%
                arrange(column,HostpitalName)
        return(BestHospitals[1,1])
        }
}

My function call

best("AL","HeartAttack")

Version info

platform x86_64-apple-darwin15.6.0
arch x86_64
os darwin15.6.0
system x86_64, darwin15.6.0
status
major 3
minor 6.1
year 2019
month 07
day 05
svn rev 76782
language R
version.string R version 3.6.1 (2019-07-05) nickname Action of the Toes

output of dput(head(dataSelected)):

structure(list(HostpitalName = c("SOUTHEAST ALABAMA MEDICAL CENTER", 
"MARSHALL MEDICAL CENTER SOUTH", "ELIZA COFFEE MEMORIAL HOSPITAL", 
"MIZELL MEMORIAL HOSPITAL", "CRENSHAW COMMUNITY HOSPITAL", "MARSHALL MEDICAL CENTER NORTH"
), State = c("AL", "AL", "AL", "AL", "AL", "AL"), DeathRateHeartAttack = c(14.3, 
18.5, 18.1, NA, NA, NA), DeathRateHeartFailure = c(11.4, 15.2, 
11.3, 13.6, 13.8, 12.5), DeathRatePneumonia = c(10.9, 13.9, 13.4, 
14.9, 15.8, 8.7)), row.names = c(NA, 6L), class = "data.frame")

CodePudding user response:

How about filtering by the column number instead of name?

best <- function(state,outcome){
        column <- paste('DeathRate',outcome, sep = "")
        if (state %in% dataSelected$State < 1){
                return('Invalid state')
        } else if (column %in% colnames(dataSelected) < 1){
                return('Invalid outcome')
        } else{
        BestHospitals <- dataSelected %>%
                select(HostpitalName,State,column) %>%
                filter(!is.na(.[,3])) %>%
                filter(State == state) %>%
                arrange(desc(.[3])))
        return(BestHospitals[1,1])
        }
}

CodePudding user response:

I took the liberty to rewrite your function a bit.

# it is usually a bad idea to insert a global variable (like your data frame) inside a function. 
best <- function(dat=NULL,state=NULL,outcome=NULL){ 
  
  column <- paste('DeathRate',outcome, sep = "")
  
  if (!state %in% dat$State | !column %in% colnames(dat)){
    stop('Invalid input')
  } # negating and using or "|" makes it easier to read
  else{
    BestHospitals <- dat %>%
      select(HostpitalName,State,column) %>%
      na.omit() %>% # for your purpose the much more concise na.omit() is a better option
      filter(State == state) %>%
      arrange(column,HostpitalName) %>% 
      filter(row_number()==1) #dplyr way to choose the first row
    
    return(BestHospitals)
  }
}

best(dat = dataSelected, state = "AL", outcome = "HeartAttack")

                   HostpitalName State DeathRateHeartAttack
1 ELIZA COFFEE MEMORIAL HOSPITAL    AL                 18.1

CodePudding user response:

Here's a tidyverse-like version of your function.

As D.J. comments, it's not good practice to include a reference to a global object (dataSelected) inside your function. Far better to pass this as a parameter. This also has the beneficial side effect of allowing you to use the function in a pipe.

Also, HostpitalName might be a typo. Did you perhaps mean HospitalName? As you've used the strange spelling more than once, I've retained it.

Whilst I understand why you've allowed the user to shorten the name of the required outcome by pasting the common prefix of "DeathRate" to the valuepassed in outcome, this probably isn't best practice, as requires the user to know this convention, and limits the use of the function to data frames and columns that follow this convention. It also doesn't fit well with tidyverse syntax.

best <- function(d, state, outcome){
  # By adding d as the first parameter, you make it easy to use the function in
  # a pipe.  And on data frames other than dataSelected.
  qOutcome <- enquo(outcome)
  # calling stop is better than returning an error message as 
  # it stops processing immediately.
  if (!(state %in% (d %>% distinct(State) %>% pull(State) ))) {
    stop('Invalid state')
  }
  if (!(as_label(qOutcome) %in% colnames(d)) ){
    stop('Invalid outcome')
  }
  # Converting original code to equivalent tidyverse idioms.
  # HostpitalName should perhaps be HospitalName in the source data frame
  d %>%
    select(HostpitalName, State, !! qOutcome) %>%
    filter(!is.na(!! qOutcome)) %>%
    filter(State == state) %>%
    arrange(!! qOutcome, HostpitalName) %>% 
    pull(HostpitalName) %>% 
    head(1)
}

So we can then write

dataSelected %>% best("AL", DeathRateHeartFailure)
[1] "ELIZA COFFEE MEMORIAL HOSPITAL"

or, say,

dataSelected %>% best("AL", DeathRatePneumonia)
[1] "MARSHALL MEDICAL CENTER NORTH"
  •  Tags:  
  • Related