Home > Back-end >  tApply with function that changes depending on the factors
tApply with function that changes depending on the factors

Time:11-13

I'm trying to use tapply to apply a UDF, which should be straightforward. However I would like the function itself to change depending on the data fed in for each "segment" of the split up dataset. As an example, with a loop:

#Function that gets the mean of the first column of inputs,
#then adds a number which depends on the iris species, done by a lookup from the first value in the fifth column in the data 

testFunction <- function(x,factList){
  lookNo <- match(x[1,5],names(factList))
  return (mean(x[,1]) factList[[lookNo]])
}

#Function inputs setup : list of numbers and species to go with each one
factorList <- list(10,15,5)
names(factorList) = unique(iris$Species)

testFunction(iris[iris$Species == "setosa",],factorList)

for (i in 1:3){
  x <- testFunction(iris[iris$Species == unique(iris$Species)[i],],factorList)
  if(i==1){ansArr <- x}else{ansArr<- c(ansArr,x)}
}

rbind(tapply(iris$Sepal.Length,iris$Species,mean),unlist(factorList),ansArr)

Now I'm trying to get it working with tapply, (if only for cleaner code) but it errors and says "arguments must have the same length"

> tapply(iris,iris$species,testFunction,factList = factList)

I don't really understand what's going wrong, but I'm guessing the tapply relies on getting a single vector as a main input, then slicing that up using the factors, but which it can't do if fed a matrix? I need to input a matrix as the function needs the main values column to do the function on, and the fifth column to do the lookup on.

Is there any other way to achieve this besides a loop?

Thanks,

CodePudding user response:

dt <- iris

library(data.table)    
setDT(dt)

factorList <- list(10,15,5)
names(factorList) = unique(iris$Species)

dt <- dcast(dt[, .(mean = mean(Sepal.Length)), by = Species], . ~ Species, value.var = "mean")[, `.` := NULL]
dt <- rbindlist(list(dt, factorList))
dt <- rbindlist(list(dt, dt[ , lapply(.SD, sum)]))

dt

#    setosa versicolor virginica
# 1:  5.006      5.936     6.588
# 2: 10.000     15.000     5.000
# 3: 15.006     20.936    11.588
  •  Tags:  
  • r
  • Related