I have the following for loop script:
# Create example data
dataKM <- data.frame(x1 = 1:5,
x2 = 6:10,
x3 = 11:15)
# Duplicate dataframe
datatest <- dataKM[c(1:3)]
# for loop
for(i in colnames(dataKM[,2:ncol(dataKM)])) {
# median of each single column of dataframe
median <- median(dataKM[,i])
# add column in duplicated dataframe with 'High' or 'low' based on median for each column
datatest$median[dataKM[,i] <= median ] <- "Low"
datatest$median[dataKM[,i] > median ] <- "High"
}
I'm trying to repeat for loop for each column of dataKM dataframe and save results as column in dataset dataframe. My script save only the last iteration. Probably I get a single output because I overwrite the previous value on each pass in the loop. I'd like to know how I can save all for loop output in their respective column. Can anyone help me? Thank you so much, I hope this can be useful even for someone else trying to do something similar.
CodePudding user response:
We can just use lapply
function
datatest <- dataKM[c(2:3)]
datatest[] <- lapply(dataKM[-1] , function(x) ifelse(x <= median(x) , "Low" , "High"))
colnames(datatest) <- c("x2Median" , "x3Median")
cbind(dataKM , datatest)
- output
x1 x2 x3 x2Median x3Median
1 1 6 11 Low Low
2 2 7 12 Low Low
3 3 8 13 Low Low
4 4 9 14 High High
5 5 10 15 High High
If you insist using for loop
try this
datatest <- dataKM[c(1:3)]
for(i in colnames(dataKM[-1])) {
median <- median(dataKM[,i])
datatest[[paste0(i,"median")]][dataKM[,i] <= median ] <- "Low"
datatest[[paste0(i,"median")]][dataKM[,i] > median ] <- "High"
}
CodePudding user response:
I am not sure what is compared with what. But here is an example were x2 value or x3 value is compared with its column median:
Here is a dplyr approach:
library(dplyr)
dataKM %>%
mutate(across(-1, ~case_when(. <= median(., na.rm=TRUE) ~ "Low",
. > median(., nar.rm=TRUE) ~ "High"), .names = "Median_{.col}"))
x1 x2 x3 Median_x2 Median_x3
1 1 6 11 Low Low
2 2 7 12 Low Low
3 3 8 13 Low Low
4 4 9 14 High High
5 5 10 15 High High
CodePudding user response:
Currently, you are updating a single new column, median. Simply adjust to create new median column with each iteration of for
loop, concatenating the column current column name and median.
# for loop
for(col in colnames(dataKM[,2:ncol(dataKM)])) {
curr_col <- dataKM[[col]]
# median of each single column of dataframe
col_median <- median(curr_col)
# add column in duplicated dataframe with 'High' or 'low' based on median for each column
datatest[[paste0(col, "_median")]][curr_col <= col_median] <- "Low"
datatest[[paste0(col, "_median")]][curr_col > col_median] <- "High"
}
Alternatively, with ifelse
:
for(col in colnames(dataKM[,2:ncol(dataKM)])) {
curr_col <- dataKM[[col]]
col_median <- median(curr_col)
datatest[[paste0(col, "_median")]] <- ifelse(
curr_col <= col_median, "Low", " High"
)
}