Home > Back-end >  Recode subset of variables using case when in R
Recode subset of variables using case when in R

Time:11-08

I am trying to recode some survey data in R. Here is some data similar to what I actually have.

df <- data.frame(
  A = rep("Y",5),
  B=seq(as.POSIXct("2014-01-13"), as.POSIXct("2014-01-17"), by="days"),
  C = c("Neither agree nor disagree",
        "Somewhat agree",
        "Somewhat disagree",
        "Strongly agree",
        "Strongly disagree"),
  D=c("Neither agree nor disagree",
         "Somewhat agree",
         "Somewhat disagree",
         "Strongly agree",
         "Strongly disagree")
)



I looked up some other posts and wrote the code below:

init2<-df %>%
  mutate_at(vars(c(1:4)), function(x) case_when( x == "Neither agree nor disagree" ~ 3, 
                                     x == "Somewhat agree" ~ 4, 
                                     x == "Somewhat disagree"~ 2,
                                     x== "Strongly agree"~ 5,
                                     x== "Strongly disaagree"~ 1
                                     
                                     ))

But this throws the error

Error: Problem with `mutate()` column `B`.
i `B = (function (x) ...`.
x character string is not in a standard unambiguous format

Run `rlang::last_error()` to see where the error occurred. 

My input dates are POSIXct. SHould I change their format? What is the fix for this issue? Thanks.

CodePudding user response:

It does not make sense to try to recode POSIXt columns to your Likert scale; nor does it make sense to me to try to recode the "Y" column, though at least you are not getting an error about that.

I suggest you either:

  1. Explicitly mutate the columns you want,

    df %>%
      mutate(across(c(C, D), ~ case_when(
        . == "Neither agree nor disagree" ~ 3,
        . == "Somewhat agree"             ~ 4,
        . == "Somewhat disagree"          ~ 2,
        . == "Strongly agree"             ~ 5,
        . == "Strongly disagree"          ~ 1
      )))
    #   A          B C D
    # 1 Y 2014-01-13 3 3
    # 2 Y 2014-01-14 4 4
    # 3 Y 2014-01-15 2 2
    # 4 Y 2014-01-16 5 5
    # 5 Y 2014-01-17 1 1
    
  2. Explicitly exclude columns you don't want,

    df %>%
      mutate(across(-c(A, B), ~ case_when(
        . == "Neither agree nor disagree" ~ 3,
        . == "Somewhat agree"             ~ 4,
        . == "Somewhat disagree"          ~ 2,
        . == "Strongly agree"             ~ 5,
        . == "Strongly disagree"          ~ 1
      )))
    
  3. Conditionally process them via some filter (though this is not infallible):

    df %>%
      mutate(across(where(~ all(grepl("agree", .))), ~ case_when(
        . == "Neither agree nor disagree" ~ 3,
        . == "Somewhat agree"             ~ 4,
        . == "Somewhat disagree"          ~ 2,
        . == "Strongly agree"             ~ 5,
        . == "Strongly disagree"          ~ 1
      )))
    

FYI, according to https://dplyr.tidyverse.org/reference/mutate_all.html (on 2021 Nov 7):

Scoped verbs (_if, _at, _all) have been superseded by the use of across() in an existing verb. See vignette("colwise") for details.

It pairs nicely with where, provided (surreptitiously) by the tidyselect package.

  • Related