How to run function on indivisual columns instead of data frame?-CodePudding

Hello everyone I have two data frame trying to do bootstrapping with below script1 in my script1 i am taking number of rows from data frame one and two. Instead of taking rows number from entire data frame I wanted split individual columns as a data frame and remove the zero values and than take the row number than do the bootstrapping using below script. So trying with script2 where I am creating individual data frame from for loop as I am new to R bit confused how efficiently do add the script1 function to it

please suggest me below I am providing script which is running script1 and the script2 I am trying to subset each columns creating a individual data frame

Script1

set.seed(2)
m1 <- matrix(sample(c(0, 1:10), 100, replace = TRUE), 10)
m2 <- matrix(sample(c(0, 1:5), 50, replace = TRUE), 5)
m1 <- as.data.frame(m1)
m2 <- as.data.frame(m2)
nboot <- 1e3

n_m1 <- nrow(m1); n_m2 <- nrow(m2)

temp<- c()
for (j in seq_len(nboot)) {
  boot <- sample(x = seq_len(n_m1), size = n_m2, replace = TRUE)
  value <- colSums(m2)/colSums(m1[boot,])
  temp <- rbind(temp, value)
}
boot_data<- apply(temp, 2, median)

script2

for (i in colnames(m1)){
  m1_subset=(m1[m1[[i]] > 0, ])
  m1_subset=m1_subset[i]
  m2_subset=m2[m2[[i]] >0, ]
  m2_subset=m2_subset[i]
  num_m1 <- nrow(m1_subset); n_m2 <- nrow(m2_subset)# after this wanted add above script changing input 
}

CodePudding user response：

If I understand correctly, you want to do the sampling and calculation on each column individually, after removing the 0 values. I. modified your code to work on a single vector instead of a dataframe (i.e., using length() instead of nrow() and sum() instead of colSums(). I also suggest creating the empty matrix for your results ahead of time, and filling in -- it will be fasted.

temp <- matrix(nrow = nboot, ncol = ncol(m1))
for (i in seq_along(m1)){
  m1_subset = m1[m1[,i] > 0, i]
  m2_subset = m2[m2[,i] > 0, i]
  n_m1 <- length(m1_subset); n_m2 <- length(m2_subset)
  for (j in seq_len(nboot)) {
    boot <- sample(x = seq_len(n_m1), size = n_m2, replace = TRUE)
    temp[j, i] <- sum(m2_subset)/sum(m1_subset[boot])
  }
}
boot_data <- apply(temp, 2, median)
boot_data <- setNames(data.frame(t(boot_data)), names(m1))
boot_data