Home > Software engineering >  Missing column names and case_when
Missing column names and case_when

Time:05-29

Can you suggest a workaround for this error I'm triggering? (in R 3.6.2)

Using a case_when in a mutate, trying to test if a column is present, and only then use its value:

library(tidyverse)
aCaseFn <- function(df){
  df %>% 
    mutate(UP =
             case_when("plays" %in% names(df) ~ toupper(plays),
                       "band" %in% names(df) ~ toupper(band),
                       TRUE ~ NA_character_))
}

What I'm expecting is

R > aCaseFn(band_instruments)
# A tibble: 3 x 3
  name  plays  UP    
  <chr> <chr>  <chr> 
1 John  guitar GUITAR
2 Paul  bass   BASS  
3 Keith guitar GUITAR

but instead I get this error error

R > aCaseFn(band_instruments)
 Error: Problem with `mutate()` input `UP`.
x object 'band' not found
ℹ Input `UP` is `case_when(...)`.

It appears that the toupper(band) is getting evaluated, even tho (I'd think) it shouldn't ever be reached with this argument - both because the 1st branch ("plays" %in% names(df)) is TRUE and because the 2nd branch's conditional ("band" %in% names(df)) is FALSE.

So what would be a good workaround?

CodePudding user response:

Easier option is any_of - create two formal arguments - one for inputting the dataset and second for the column names to convert to uppercase as string (nms)., loop across any_of the columns in 'nms' i.e. if it will only loop over the columns that exist in the data.frame and leave out the ones that are not present from the vector, convert to uppercase and change the column names with .names.

aCaseFn <- function(df, nms) 
{

df %>%
    mutate(across(any_of(nms), toupper, .names = "UP_{.col}"))
}

-testing

str1 <- c("plays", "band")
> aCaseFn(band_instruments, str1)
   name  plays UP_plays
1  John guitar   GUITAR
2  Paul   bass     BASS
3 Keith guitar   GUITAR

NOTE: case_when/if_else/ifelse requires all the arguments to be of same length i.e. "plays" %in% names(df) returns a single TRUE/FALSE output where as toupper(plays) length will be the nrow(df). Here if/else would be more useful..

data

band_instruments <- structure(list(name = c("John", "Paul", 
"Keith"), plays = c("guitar", 
"bass", "guitar")), class = "data.frame", row.names = c("1", 
"2", "3"))

CodePudding user response:

Since the set of columns is fixed for all rows, you don't have to check it row-wise with case_when. I think you might want to determine the name of the target column first, and then use it in mutate:

target_columns <- c('plays', 'band')
col_n <- which(target_columns %in% colnames(df))
up_column <- target_columns[if (length(col_n) > 0) min(col_n) else col_n]
df %>% 
    mutate(
        UP = if (length(up_column) > 0) {
            toupper(.data[[up_column]])
        } else {
            NA_character_
        }
    )
  • Related