I have a dataset which contains about 40 different variables. Now I would like to create a new variable indicating whether each observation is above or below the median.
I managed to create a new variable "var1_mediansplit" from the existing "var1" (values 1 for below median, 2 for everything else):
mydata$var1_mediansplit <- ifelse(mydata$var1 < median(mydata$var1), mydata$var1_mediansplit <- "1", mydata$var1_mediansplit <- "2"
I am looking for a way to run it through several variables (with a loop, I would guess). I appreciate any help!
CodePudding user response:
Using the colMedians
and eachrow
from the Rfast
package:
library(Rfast)
df <- as.data.frame(matrix(runif(4000), ncol = 40)) # dummy data
m <- as.matrix(df)
df2 <- as.data.frame((eachrow(m, colMedians(m), "-") >= 0) 1)
CodePudding user response:
Don't overcomplicate it, consider this:
v <- c(1:100)
x <- median(v)
y <- v >= x