Home > Net >  R: for loop output save only last results in output dataframe
R: for loop output save only last results in output dataframe

Time:07-31

I have the following for loop script:

# Create example data
dataKM <- data.frame(x1 = 1:5,    
                     x2 = 6:10,
                     x3 = 11:15)
# Duplicate dataframe
datatest <- dataKM[c(1:3)]

# for loop
for(i in colnames(dataKM[,2:ncol(dataKM)])) {
  # median of each single column of dataframe
  median <- median(dataKM[,i])
  # add column in duplicated dataframe with 'High' or 'low' based on median for each column
  datatest$median[dataKM[,i] <= median ] <- "Low"
  datatest$median[dataKM[,i] > median ] <- "High"
}

I'm trying to repeat for loop for each column of dataKM dataframe and save results as column in dataset dataframe. My script save only the last iteration. Probably I get a single output because I overwrite the previous value on each pass in the loop. I'd like to know how I can save all for loop output in their respective column. Can anyone help me? Thank you so much, I hope this can be useful even for someone else trying to do something similar.

CodePudding user response:

We can just use lapply function

datatest <- dataKM[c(2:3)]
datatest[] <- lapply(dataKM[-1] , function(x) ifelse(x <= median(x) , "Low" , "High"))

colnames(datatest) <- c("x2Median" , "x3Median")

cbind(dataKM , datatest)

  • output
  x1 x2 x3  x2Median x3Median
1  1  6 11      Low      Low
2  2  7 12      Low      Low
3  3  8 13      Low      Low
4  4  9 14      High     High
5  5 10 15      High     High

If you insist using for loop try this

datatest <- dataKM[c(1:3)]

for(i in colnames(dataKM[-1])) {
    median <- median(dataKM[,i])
    datatest[[paste0(i,"median")]][dataKM[,i] <= median ] <- "Low"
    datatest[[paste0(i,"median")]][dataKM[,i] > median ] <- "High"
}

CodePudding user response:

I am not sure what is compared with what. But here is an example were x2 value or x3 value is compared with its column median:

Here is a dplyr approach:

library(dplyr)

dataKM %>% 
  mutate(across(-1, ~case_when(. <= median(., na.rm=TRUE) ~ "Low",
                               . > median(., nar.rm=TRUE) ~ "High"), .names = "Median_{.col}"))
  x1 x2 x3 Median_x2 Median_x3
1  1  6 11       Low       Low
2  2  7 12       Low       Low
3  3  8 13       Low       Low
4  4  9 14      High      High
5  5 10 15      High      High

CodePudding user response:

Currently, you are updating a single new column, median. Simply adjust to create new median column with each iteration of for loop, concatenating the column current column name and median.

# for loop
for(col in colnames(dataKM[,2:ncol(dataKM)])) {
  curr_col <- dataKM[[col]]
  # median of each single column of dataframe
  col_median <- median(curr_col)

  # add column in duplicated dataframe with 'High' or 'low' based on median for each column
  datatest[[paste0(col, "_median")]][curr_col <= col_median] <- "Low"
  datatest[[paste0(col, "_median")]][curr_col > col_median] <- "High"
}

Alternatively, with ifelse:

for(col in colnames(dataKM[,2:ncol(dataKM)])) {
  curr_col <- dataKM[[col]]
  col_median <- median(curr_col)

  datatest[[paste0(col, "_median")]] <- ifelse(
    curr_col <= col_median, "Low", " High"
  )
}
  • Related