I would like to create a function that I can apply on different variable of a dataframe. Here is the dataframe
data=data.frame(V1=c(0,25,6,"NC", 9, 10, "", "", 15), V2=c(47,"NC",56,"NC", "", 42, "", 48, ""), V3=c(2,5,3,4, 9,5, "", "", 2))
> data
V1 V2 V3
1 0 47 2
2 25 NC 5
3 6 56 3
4 -9 NC 4
5 9 9
6 10 42 5
7 -9
8 -9 48
9 15 2
and here is the operation that I woulk like to include in my function (clin=function(data, variable_name))
data$V1=as.numeric(data$V1)
data$V1[is.na(data$V1)]=-9
data_V1 = data %>% mutate(tot=n()) %>%
mutate(rep= ifelse(V1==-9, "no_value", "value")) %>%
mutate(sum_value=ifelse(rep=="value", sum(rep=="value"), tot-sum(rep=="value"))) %>%
mutate(variable="V1") %>%
select(variable, rep, sum_value) %>%
distinct(rep, .keep_all=TRUE)
My problem is how to call the variable name inside the function. It doesn't work if I use clin(data, "V1")
CodePudding user response:
If you want to use it in a function you need some non-standard evaluation.
library(dplyr)
clean =function(data, variable_name) {
data %>%
mutate(!!variable_name := suppressWarnings(as.numeric(.data[[variable_name]])),
!!variable_name := replace(.data[[variable_name]], is.na(.data[[variable_name]]), -9),
tot = n(),
rep= ifelse(.data[[variable_name]] ==-9, "no_value", "value"),
sum_value=ifelse(rep=="value", sum(rep=="value"), tot-sum(rep=="value")),
variable=variable_name) %>%
select(variable, rep, sum_value) %>%
distinct(rep, .keep_all=TRUE)
}
clean(data, "V1")
# variable rep sum_value
#1 V1 value 6
#2 V1 no_value 3
clean(data, "V2")
# variable rep sum_value
#1 V2 value 4
#2 V2 no_value 5
To summarise -
- A single
mutate
statement is enough here. - Use
!!variable_name :=
on left hand side to assign the column name. - Use
.data[[variable_name]]
to access the value of the column name passed.