Home > Blockchain >  function with variable name in argument
function with variable name in argument

Time:10-23

I would like to create a function that I can apply on different variable of a dataframe. Here is the dataframe

data=data.frame(V1=c(0,25,6,"NC", 9, 10, "", "", 15), V2=c(47,"NC",56,"NC", "", 42, "", 48, ""), V3=c(2,5,3,4, 9,5, "", "", 2))

> data
  V1 V2 V3
1  0 47  2
2 25 NC  5
3  6 56  3
4 -9 NC  4
5  9     9
6 10 42  5
7 -9      
8 -9 48   
9 15     2

and here is the operation that I woulk like to include in my function (clin=function(data, variable_name))

data$V1=as.numeric(data$V1)
data$V1[is.na(data$V1)]=-9
data_V1 = data %>% mutate(tot=n()) %>% 
  mutate(rep= ifelse(V1==-9, "no_value", "value")) %>% 
  mutate(sum_value=ifelse(rep=="value", sum(rep=="value"), tot-sum(rep=="value"))) %>% 
  mutate(variable="V1") %>% 
  select(variable, rep, sum_value) %>% 
  distinct(rep, .keep_all=TRUE) 

My problem is how to call the variable name inside the function. It doesn't work if I use clin(data, "V1")

CodePudding user response:

If you want to use it in a function you need some non-standard evaluation.

library(dplyr)

clean =function(data, variable_name) {
  data %>%
    mutate(!!variable_name := suppressWarnings(as.numeric(.data[[variable_name]])), 
           !!variable_name := replace(.data[[variable_name]], is.na(.data[[variable_name]]), -9), 
           tot = n(),
           rep= ifelse(.data[[variable_name]] ==-9, "no_value", "value"),
           sum_value=ifelse(rep=="value", sum(rep=="value"), tot-sum(rep=="value")),
           variable=variable_name) %>% 
    select(variable, rep, sum_value) %>% 
    distinct(rep, .keep_all=TRUE)
}

clean(data, "V1")

#  variable      rep sum_value
#1       V1    value         6
#2       V1 no_value         3

clean(data, "V2")

#  variable      rep sum_value
#1       V2    value         4
#2       V2 no_value         5

To summarise -

  • A single mutate statement is enough here.
  • Use !!variable_name := on left hand side to assign the column name.
  • Use .data[[variable_name]] to access the value of the column name passed.
  • Related