On occassion I get survey data with likert scale string items that I need to change to numeric in order to calculate basic descriptive statistics. In order to do this, I usually use the case_when function to create a new column for each item and assign each data point a numeric value. I am trying to write a function that can do this for many different columns all at once, so that I don't have to keep copy and pasting code. I am relatively new to this so any help would be appreciated:)
Here is what I have done previously in R:
#create data frame
df <- data.frame(v1 = c("Definitely True", "Somewhat True","Somewhat False","Definitely False"),
v2 = c("Definitely False","Somewhat False","Somewhat True","Definitely True"))
#Use case_when to add numeric columns to dataframe
df$v1n <- case_when((df$v1 == "Definitely True")==TRUE ~ "1",
(df$v1 == "Somewhat True")==TRUE ~ "2",
(df$v1 == "Somewhat False")==TRUE ~ "3",
(df$v1 == "Definitely False")==TRUE ~ "4")
df$v2n <- case_when((df$v2 == "Definitely True")==TRUE ~ "1",
(df$v2 == "Somewhat True")==TRUE ~ "2",
(df$v2 == "Somewhat False")==TRUE ~ "3",
(df$v2 == "Definitely False")==TRUE ~ "4")
This works if I want to replace each string value with a numeric value and overwrite data in the existing columns:
for(i in colnames(data_x)) {
data_x[[i]] <- case_when((data_x[,i] == "Definitely True")==TRUE ~ "1",
(data_x[,i] == "Somewhat True")==TRUE ~ "2",
(data_x[,i] == "Somewhat False")==TRUE ~ "3",
(data_x[,i] == "Definitely False")==TRUE ~ "4")
}
But I would like to find a way to create a new column for each iteration as I did with the copy and paste version. Here is something I have tried but I haven't had any success. Any help on this would be appreciated.
for(i in colnames(df)) {
df[[var[i]]] <- case_when((df[,i] == "Definitely True")==TRUE ~ "1",
(df[,i] == "Somewhat True")==TRUE ~ "2",
(df[,i] == "Somewhat False")==TRUE ~ "3",
(df[,i] == "Definitely False")==TRUE ~ "4")
}
CodePudding user response:
dplyr
df %>%
mutate(across(v1:v2, ~ case_when(
. == "Definitely True" ~ "1",
. == "Somewhat True" ~ "2",
. == "Somewhat False" ~ "3",
TRUE ~ "4"
), .names = "{.col}n")
)
# v1 v2 v1n v2n
# 1 Definitely True Definitely False 1 4
# 2 Somewhat True Somewhat False 2 3
# 3 Somewhat False Somewhat True 3 2
# 4 Definitely False Definitely True 4 1
across
gives us the ability to do one thing across multiple columns. We can usev1:v2
-syntax, or one of the otherdplyr
selector functions likematches
,starts_with
, etc.- the second argument for
across
here is a tilde-function (rlang
-style), inside which.
is replaced with the column data each iteration. For instance, the first time that this tilde-function is evaluated, the.
references the vectordf$v1
. - because the default action of
mutate(across(...))
will be to replace the columns, I add.names=
to control the naming of the resulting data. This notation usesglue
-syntax, where{.col}
is replaced by the name of the column being evaluated in each iteration.
base R
I'll add the optional use of a lookup map.
lookup <- c("Definitely True" = "1", "Somewhat True" = "2", "Somewhat False" = "3", "Definitely False" = "4")
df <- cbind(df, setNames(lapply(df[,1:2], function(z) lookup[z]), paste0(names(df[,1:2]), "n")))
rownames(df) <- NULL
df
# v1 v2 v1n v2n
# 1 Definitely True Definitely False 1 4
# 2 Somewhat True Somewhat False 2 3
# 3 Somewhat False Somewhat True 3 2
# 4 Definitely False Definitely True 4 1