Home > database >  Applying a function to every row on each n number of columns in R
Applying a function to every row on each n number of columns in R

Time:03-17

My data contains consecutive columns 1,2,...,2000. I want to apply a functions that returns a 3 vars for each group of 100 columns for each row.

The data look like this:

  1       2        3    .....   2000  
0.01    0.0       0.002         0.03
0.005   0.002     0.011         0.04
0.001   0.003     0.004         0.0

Here is the code I tried:

prep_data <- function(df){
  #Create Column names
  colnms<-c()
  for(i in seq(1, 20, 1)){
    
    for(j in seq(1, 3, 1)){
      f<-paste0("grp",i,"_",j)
      colnms=c(colnms,f)
    }
    
  }
  #
  trans <- data.frame(matrix(ncol = 60, nrow = NROW(df)))
  colnames(trans) <-colnms

#Looping over every row
  for (i in 1:NROW(df)){
      X = c()
      #LOOPING over each group of 100 columns
      for(j in seq(1, 1900, 100)){
        end<-j 99
        tmp<-subset(df[i], select=j:end)
        #Here I apply the function over the 100 columns for the current row to get 3 values#
          X = c(X,MY_FUNC(tmp))
         ###################################################################################          
}
      }
#Append the current row
      trans[i,] <- X
  }
  return(trans)
  
}

The expected output (A dataframe of 60 columns) is as follows:

grp1_1  grp1_2    grp1_3 .....  grp20_3  
0.01    0.0       0.002         0.03
0.005   0.002     0.011         0.04
0.001   0.003     0.004         0.0

My code runs but its too slow probably because it's not efficient with all the loops

Thanks in advance

CodePudding user response:

Here is one approach:

Let d be your 3 rows x 2000 columns frame, with column names as.character(1:2000) (See below for generation of fake data). We add a row identifier using .I, then melt the data long, adding grp, and column-group identifier (i.e. identifying the 20 sets of 100). Then apply your function myfunc (see below for stand-in function for this example), by row and group, and swing wide. (I used stringr::str_pad to add 0 to the front of the group number)

# add row identifier
d[, row:=.I]

# melt and add col group identifier
dm = melt(d,id.vars = "row",variable.factor = F)[,variable:=as.numeric(variable)][order(variable,row), grp:=rep(1:20, each=300)]

# get the result (180 rows long), applying myfync to each set of columns, by row
result = dm[, myfunc(value), by=.(row,grp)][,frow:=rep(1:3,times=60)]

# swing wide (3 rows long, 60 columns wide)
dcast(
  result[,v:=paste0("grp",stringr::str_pad(grp,2,pad = "0"),"_",row)],
  frow~v,value.var="V1"
  )[, frow:=NULL][]

Output: (first six columns only)

      grp01_1    grp01_2    grp01_3    grp02_1    grp02_2    grp02_3
        <num>      <num>      <num>      <num>      <num>      <num>
1: 0.54187168 0.47650694 0.48045694 0.51278399 0.51777319 0.46607845
2: 0.06671367 0.08763655 0.08076939 0.07930063 0.09830116 0.07807937
3: 0.25828989 0.29603471 0.28419957 0.28160367 0.31353016 0.27942687

Input:

d = data.table()
alloc.col(d,2000)
set.seed(123)
for(c in 1:2000)  set(d,j=as.character(c), value=runif(3))

myfunc Function (toy example for this answer):

myfunc <- function(x) c(mean(x), var(x), sd(x))
  • Related