Home > OS >  Use variables as arguments in functions in R
Use variables as arguments in functions in R

Time:11-11

I want to write a function for a very repetitive action. The data looks like this

id<-c(100,104,999,225,350,450)
sex<-c('female','male','male','female','male','male')
race<-c('black','white','white','white','black','white')
class<-c('a','a','c','b','c','b')
adur<-c(3,3,15,3,3,59)
bdur<-c(2,59,26,59,2,14)
cdur<-c(1,59,59,59,59,1)
ae<-c(1,1,1,1,1,0)
be<-c(1,0,1,0,1,1)
ce<-c(1,0,0,0,1,1)

mydata<-data.frame(id,sex,race,class,adur,bdur,cdur,ae,be,ce)

   id    sex  race class adur bdur cdur ae be ce
1 100 female black     a    3    2    1  1  1  1
2 104   male white     a    3   59   59  1  0  0
3 999   male white     c   15   26   59  1  1  0
4 225 female white     b    3   59   59  1  0  0
5 350   male black     c    3    2   59  1  1  1
6 450   male white     b   59   14    1  0  1  1

I want to group by different variables (sex,race,class) and do some calculations. This is my attempt.

stp_f<-function(ivar,idur,ie){
  x<-mydata %>% group_by(ivar) %>% summarise(sumdur=sum(idur),
                                              sumev=sum(ie),
                                              failrate=sumev/sumdur) %>%
    rename(var=ivar)
}

stp_f(sex,adur,ae)
stp_f(sex,bdur,be)
stp_f(sex,cdur,ce)

It doesn't work because I think R doesn't read variables this way. I have been suggested to abandon tidyverse and use data.table instead, but because I am not familiar with data.table syntax I find it hard to wrap my head around. Can someone explain this in detail in data.table or use dplyr grammar for this function?

CodePudding user response:

A base R solution

stp=function(ivar,idur,ie){
  tmp=aggregate(
    as.formula(paste0(".~",ivar)),
    subset(mydata,select=c(ivar,idur,ie)),
    sum
  )
  colnames(tmp)=c("var","sumdur","sumev")
  tmp$failrate=tmp$sumev/tmp$sumdur
  tmp
}

stp("sex","adur","ce")

     var sumdur sumev  failrate
1 female      6     1 0.1666667
2   male     80     2 0.0250000

CodePudding user response:

Using the tidyverse:

stp_f<-function(ivar,idur,ie){
  x <- mydata %>%
    group_by(get(ivar)) %>%
    summarise(
      sumdur = sum(get(idur)),
      sumev = sum(get(ie)),
      failrate = sumev / sumdur
    ) %>%
    rename(var = `get(ivar)`)
  
  x
}

stp_f("sex","adur","ae")
stp_f("sex","bdur","be")
stp_f("sex","cdur","ce")

Outputs:

> stp_f("sex","adur","ae")
# A tibble: 2 x 4
  var    sumdur sumev failrate
  <chr>   <dbl> <dbl>    <dbl>
1 female      6     2   0.333 
2 male       80     3   0.0375


> stp_f("sex","bdur","be")
# A tibble: 2 x 4
  var    sumdur sumev failrate
  <chr>   <dbl> <dbl>    <dbl>
1 female     61     1   0.0164
2 male      101     3   0.0297


> stp_f("sex","cdur","ce")
# A tibble: 2 x 4
  var    sumdur sumev failrate
  <chr>   <dbl> <dbl>    <dbl>
1 female     60     1   0.0167
2 male      178     2   0.0112

CodePudding user response:

Althoug get is one option to solve this problem the recommended tidyverse way to program with {dplyr} would be to use double embrace {{ }}:

library(dplyr)

stp_f <-function(ivar, idur, ie){
  x <- mydata %>%
        group_by({{ ivar }}) %>%
        summarise(sumdur = sum({{ idur }}),
                  sumev  = sum({{ ie }}),
                  failrate = sumev/sumdur) %>%
    rename(var = {{ ivar }})
  x
}

stp_f(sex,adur,ae)
#> # A tibble: 2 x 4
#>   var    sumdur sumev failrate
#>   <chr>   <dbl> <dbl>    <dbl>
#> 1 female      6     2   0.333 
#> 2 male       80     3   0.0375

stp_f(sex,bdur,be)
#> # A tibble: 2 x 4
#>   var    sumdur sumev failrate
#>   <chr>   <dbl> <dbl>    <dbl>
#> 1 female     61     1   0.0164
#> 2 male      101     3   0.0297

stp_f(sex,cdur,ce)
#> # A tibble: 2 x 4
#>   var    sumdur sumev failrate
#>   <chr>   <dbl> <dbl>    <dbl>
#> 1 female     60     1   0.0167
#> 2 male      178     2   0.0112

Created on 2021-11-10 by the reprex package (v2.0.1)

  • Related