Home > Net >  Recode all occurrences of multiple strings across dataframe
Recode all occurrences of multiple strings across dataframe

Time:11-11

I'm trying to convert Likert scale survey data (e.g., "Strongly Agree - 1") into numeric data for use in statistical analysis. I've got dozens of questions using the same scale.

I found a solution, but it seems clumsy and was hoping someone could suggest an improvement for the sake of learning.

df = df %>% 
  mutate_all(funs(str_replace(.,"Very Dissatisfied1", "1"))) %>%
  mutate_all(funs(str_replace(.,"ModeratelyDissatisfied2", "2"))) %>%
  mutate_all(funs(str_replace(.,"SlightlyDissatisfied3", "3"))) %>%
  mutate_all(funs(str_replace(.,"Neither SatisfiedNor Dissatisfied4", "4"))) %>%
  mutate_all(funs(str_replace(.,"SlightlySatisfied5", "5"))) %>%
  mutate_all(funs(str_replace(.,"ModeratelySatisfied6", "6"))) %>%
  mutate_all(funs(str_replace(.,"VerySatisfied7", "7")))

I'm not sure what funs() is doing here, or to what extent mutate_all can take multiple arguments. How can this code be improved? Thanks for your help.

CodePudding user response:

funs and mutate_all are superseded in new dplyr versions. In stead we can use the newer implementations:

# Define a set of replacements
# What we want
replacements <- c(
  "Very Dissatisfied1",
  "ModeratelyDissatisfied2",
  "SlightlyDissatisfied3",
  "Neither SatisfiedNor Dissatisfied4",
  "SlightlySatisfied5",
  "ModeratelySatisfied6",
  "VerySatisfied7"
) %>% 
  # What we want to replace
  setNames(1:7)
# Then e.g., change them across all character columns
 df %>% 
   mutate(
     across(where(is.character), str_replace_all, replacements)
   )

CodePudding user response:

Note if the pattern is the same, I mean use the final digit of the replacement as code to be the numeric value, then we can do:

  data.frame(replacements, 
             code = as.numeric(sub(".*(\\d $)", "\\1", replacements)))
                        replacements code
1                 Very Dissatisfied1    1
2            ModeratelyDissatisfied2    2
3              SlightlyDissatisfied3    3
4 Neither SatisfiedNor Dissatisfied4    4
5                 SlightlySatisfied5    5
6               ModeratelySatisfied6    6
7                     VerySatisfied7    7

Or even shorter:

data.frame(replacements, 
           code = as.numeric(sub("\\D ", "", replacements)))

Data comes from @Baraliuh

  •  Tags:  
  • r
  • Related