Home > Back-end >  data.table - using mapply in R
data.table - using mapply in R

Time:08-30

I want to apply my function to every row of a data.table:

set.seed(13579) 
cat1N <- 10
cat2N <- 15
cat3N <- 7
group = c(rep("Group1", cat1N), rep("Group1", cat1N), rep("Group1", cat1N), 
          rep("Group2", cat2N), rep("Group2", cat3N)) # policyID
year = c(rep(2015, cat1N), rep(2016, cat1N), rep(2017, cat1N), 
         rep(2016, cat2N), 
         rep(2017, cat3N))
category = c(rep("cat1", cat1N/2), rep("cat2", cat1N/2), rep("cat1", cat1N/2), rep("cat2", cat1N/2),  rep("cat1", cat1N/2), rep("cat2", cat1N/2), 
             rep("cat2", 7), rep("cat3", 8), rep("cat3", 3), rep("cat1", 4)) # plan 
value = c(abs(rnorm(cat1N)*100), abs(rnorm(cat1N)*100), abs(rnorm(cat1N)*100), 
          abs(rnorm(cat2N)*100), abs(rnorm(cat3N)*100))

require("data.table")

testData <- data.table(group = group, 
                       year = year, 
                       category = category, 
                       value = value)

I aggregated the data as follows:

cohort = c("group" ,"category", "year")
testAgg <- testData[, group := group][, .(values = .(.SD)), by = cohort] 
> testAgg
     group category year            values
 1: Group1     cat1 2015 <data.table[5x1]>
 2: Group1     cat2 2015 <data.table[5x1]>
 3: Group1     cat1 2016 <data.table[5x1]>
 4: Group1     cat2 2016 <data.table[5x1]>
 5: Group1     cat1 2017 <data.table[5x1]>
 6: Group1     cat2 2017 <data.table[5x1]>
 7: Group2     cat2 2016 <data.table[7x1]>
 8: Group2     cat3 2016 <data.table[8x1]>
 9: Group2     cat3 2017 <data.table[3x1]>
10: Group2     cat1 2017 <data.table[4x1]>

and want to use the mapply function to apply the same function over every row:

calculateCI <- function(value){
  
  avg <- mean(value)
  s <- sqrt(var(value))
  n <- length(value)
  
  error <- qnorm(0.975)*s/sqrt(n)
  
  lower <- avg - error
  upper <- avg   error 
  
  return(c(lower, upper)) 
} 

> testAgg[, 'lowerCI' := mapply(calculateCI, values[1])[1]]
Warning message:
In mean.default(value) : argument is not numeric or logical: returning NA
> testAgg[, 'upperCI' := mapply(calculateCI, values[1])[2]]
Warning message:
In mean.default(value) : argument is not numeric or logical: returning NA

What is wrong with my mapply? and how can I fix it?

The idea is to calculate confidence intervals for values

CodePudding user response:

We don't need mapply, can do this with lapply

testAgg[, c("lowerCI", "upperCI") := transpose(lapply(values, 
     function(x) calculateCI(x$value)))]

-output

> testAgg
     group category  year            values    lowerCI   upperCI
    <char>   <char> <num>            <list>      <num>     <num>
 1: Group1     cat1  2015 <data.table[5x1]>  64.958526 149.68502
 2: Group1     cat2  2015 <data.table[5x1]>  13.234171 176.35595
 3: Group1     cat1  2016 <data.table[5x1]>  43.562119  72.14915
 4: Group1     cat2  2016 <data.table[5x1]>  39.377224 102.42184
 5: Group1     cat1  2017 <data.table[5x1]>  34.121206 138.28066
 6: Group1     cat2  2017 <data.table[5x1]>  -1.573475 124.55888
 7: Group2     cat2  2016 <data.table[7x1]>  46.608221 133.32852
 8: Group2     cat3  2016 <data.table[8x1]>  41.979619 106.16266
 9: Group2     cat3  2017 <data.table[3x1]> 147.817873 171.19777
10: Group2     cat1  2017 <data.table[4x1]>  30.109861 115.44993

Or the same option with Map

testAgg[, c("lowerCI", "upperCI") := transpose(Map(function(x) 
    calculateCI(x$value), values))]
  • Related