I want to write a function for a very repetitive action. The data looks like this
id<-c(100,104,999,225,350,450)
sex<-c('female','male','male','female','male','male')
race<-c('black','white','white','white','black','white')
class<-c('a','a','c','b','c','b')
adur<-c(3,3,15,3,3,59)
bdur<-c(2,59,26,59,2,14)
cdur<-c(1,59,59,59,59,1)
ae<-c(1,1,1,1,1,0)
be<-c(1,0,1,0,1,1)
ce<-c(1,0,0,0,1,1)
mydata<-data.frame(id,sex,race,class,adur,bdur,cdur,ae,be,ce)
id sex race class adur bdur cdur ae be ce
1 100 female black a 3 2 1 1 1 1
2 104 male white a 3 59 59 1 0 0
3 999 male white c 15 26 59 1 1 0
4 225 female white b 3 59 59 1 0 0
5 350 male black c 3 2 59 1 1 1
6 450 male white b 59 14 1 0 1 1
I want to group by different variables (sex,race,class) and do some calculations. This is my attempt.
stp_f<-function(ivar,idur,ie){
x<-mydata %>% group_by(ivar) %>% summarise(sumdur=sum(idur),
sumev=sum(ie),
failrate=sumev/sumdur) %>%
rename(var=ivar)
}
stp_f(sex,adur,ae)
stp_f(sex,bdur,be)
stp_f(sex,cdur,ce)
It doesn't work because I think R doesn't read variables this way. I have been suggested to abandon tidyverse and use data.table instead, but because I am not familiar with data.table syntax I find it hard to wrap my head around. Can someone explain this in detail in data.table or use dplyr grammar for this function?
CodePudding user response:
A base R solution
stp=function(ivar,idur,ie){
tmp=aggregate(
as.formula(paste0(".~",ivar)),
subset(mydata,select=c(ivar,idur,ie)),
sum
)
colnames(tmp)=c("var","sumdur","sumev")
tmp$failrate=tmp$sumev/tmp$sumdur
tmp
}
stp("sex","adur","ce")
var sumdur sumev failrate
1 female 6 1 0.1666667
2 male 80 2 0.0250000
CodePudding user response:
Using the tidyverse
:
stp_f<-function(ivar,idur,ie){
x <- mydata %>%
group_by(get(ivar)) %>%
summarise(
sumdur = sum(get(idur)),
sumev = sum(get(ie)),
failrate = sumev / sumdur
) %>%
rename(var = `get(ivar)`)
x
}
stp_f("sex","adur","ae")
stp_f("sex","bdur","be")
stp_f("sex","cdur","ce")
Outputs:
> stp_f("sex","adur","ae")
# A tibble: 2 x 4
var sumdur sumev failrate
<chr> <dbl> <dbl> <dbl>
1 female 6 2 0.333
2 male 80 3 0.0375
> stp_f("sex","bdur","be")
# A tibble: 2 x 4
var sumdur sumev failrate
<chr> <dbl> <dbl> <dbl>
1 female 61 1 0.0164
2 male 101 3 0.0297
> stp_f("sex","cdur","ce")
# A tibble: 2 x 4
var sumdur sumev failrate
<chr> <dbl> <dbl> <dbl>
1 female 60 1 0.0167
2 male 178 2 0.0112
CodePudding user response:
Althoug get
is one option to solve this problem the recommended tidyverse way to program with {dplyr} would be to use double embrace {{ }}
:
library(dplyr)
stp_f <-function(ivar, idur, ie){
x <- mydata %>%
group_by({{ ivar }}) %>%
summarise(sumdur = sum({{ idur }}),
sumev = sum({{ ie }}),
failrate = sumev/sumdur) %>%
rename(var = {{ ivar }})
x
}
stp_f(sex,adur,ae)
#> # A tibble: 2 x 4
#> var sumdur sumev failrate
#> <chr> <dbl> <dbl> <dbl>
#> 1 female 6 2 0.333
#> 2 male 80 3 0.0375
stp_f(sex,bdur,be)
#> # A tibble: 2 x 4
#> var sumdur sumev failrate
#> <chr> <dbl> <dbl> <dbl>
#> 1 female 61 1 0.0164
#> 2 male 101 3 0.0297
stp_f(sex,cdur,ce)
#> # A tibble: 2 x 4
#> var sumdur sumev failrate
#> <chr> <dbl> <dbl> <dbl>
#> 1 female 60 1 0.0167
#> 2 male 178 2 0.0112
Created on 2021-11-10 by the reprex package (v2.0.1)