My data contains consecutive columns 1,2,...,2000. I want to apply a functions that returns a 3 vars for each group of 100 columns for each row.
The data look like this:
1 2 3 ..... 2000
0.01 0.0 0.002 0.03
0.005 0.002 0.011 0.04
0.001 0.003 0.004 0.0
Here is the code I tried:
prep_data <- function(df){
#Create Column names
colnms<-c()
for(i in seq(1, 20, 1)){
for(j in seq(1, 3, 1)){
f<-paste0("grp",i,"_",j)
colnms=c(colnms,f)
}
}
#
trans <- data.frame(matrix(ncol = 60, nrow = NROW(df)))
colnames(trans) <-colnms
#Looping over every row
for (i in 1:NROW(df)){
X = c()
#LOOPING over each group of 100 columns
for(j in seq(1, 1900, 100)){
end<-j 99
tmp<-subset(df[i], select=j:end)
#Here I apply the function over the 100 columns for the current row to get 3 values#
X = c(X,MY_FUNC(tmp))
###################################################################################
}
}
#Append the current row
trans[i,] <- X
}
return(trans)
}
The expected output (A dataframe of 60 columns) is as follows:
grp1_1 grp1_2 grp1_3 ..... grp20_3
0.01 0.0 0.002 0.03
0.005 0.002 0.011 0.04
0.001 0.003 0.004 0.0
My code runs but its too slow probably because it's not efficient with all the loops
Thanks in advance
CodePudding user response:
Here is one approach:
Let d
be your 3 rows x 2000 columns frame, with column names as.character(1:2000)
(See below for generation of fake data). We add a row identifier using .I
, then melt the data long, adding grp
, and column-group identifier (i.e. identifying the 20 sets of 100). Then apply your function myfunc
(see below for stand-in function for this example), by row and group, and swing wide. (I used stringr::str_pad
to add 0 to the front of the group number)
# add row identifier
d[, row:=.I]
# melt and add col group identifier
dm = melt(d,id.vars = "row",variable.factor = F)[,variable:=as.numeric(variable)][order(variable,row), grp:=rep(1:20, each=300)]
# get the result (180 rows long), applying myfync to each set of columns, by row
result = dm[, myfunc(value), by=.(row,grp)][,frow:=rep(1:3,times=60)]
# swing wide (3 rows long, 60 columns wide)
dcast(
result[,v:=paste0("grp",stringr::str_pad(grp,2,pad = "0"),"_",row)],
frow~v,value.var="V1"
)[, frow:=NULL][]
Output: (first six columns only)
grp01_1 grp01_2 grp01_3 grp02_1 grp02_2 grp02_3
<num> <num> <num> <num> <num> <num>
1: 0.54187168 0.47650694 0.48045694 0.51278399 0.51777319 0.46607845
2: 0.06671367 0.08763655 0.08076939 0.07930063 0.09830116 0.07807937
3: 0.25828989 0.29603471 0.28419957 0.28160367 0.31353016 0.27942687
Input:
d = data.table()
alloc.col(d,2000)
set.seed(123)
for(c in 1:2000) set(d,j=as.character(c), value=runif(3))
myfunc
Function (toy example for this answer):
myfunc <- function(x) c(mean(x), var(x), sd(x))