for easier explanation I'm gonna use a smaller example.
I have two DF:
DF1: T01 T02 T03 T04 T05
1 15 20 48 25 5
2 12 18 35 30 12
3 13 15 50 60 42
DF2: MEDIAN SD
T01 13 1.24
T02 18 2.05
T03 45 6.64
T04 30 15.45
T05 12 16.04
What I want to do is create a loop that adds a dummy to DF1 for each variable, that take value 1 if DF1$T01 ≈ (almost equal) to DF2$MEDIAN[1], and 0 if it's not, and then goes to T02, T03, until it breaks.
Until now, I haven't been able to create a loop (I'm not really good at creating loops tho) that makes this. I did manage to make the dummy for one of the variables (T01), but in the real DF I have over 40 variables, so doing it by hand it´s not efficient at all. What I have right now is:
DF1$dummyt01 <- ifelse(almost.equal(DF1$T01, DF2$MEDIAN[1], tolerance = 2),1,0)
outcome expected:
DF1: T01 T02 T03 T04 T05 dummyT01 dummyT02 ... dummyT05
1 15 20 48 25 5 1 1 ... 0
2 12 18 35 30 12 1 1 ... 1
3 13 15 50 60 42 1 0 ... 0
Note: Not a native english speaker. Sorry for any mistakes.
EDIT: Expected Outcome.
CodePudding user response:
We may use tidyverse
. Loop across
the columns of 'DF1', get the column names of that column looped (cur_column()
), use that to subset the 'DF2' (as row names) 'MEDIAN' element, do the comparison with almost.equal
to return a logical vector, which is coerced to binary with as.integer
or
. In the .names
add the prefix 'dummy' so as to create as new columns
library(dplyr)
library(berryFunctions)
DF1 <- DF1 %>%
mutate(across(everything(), ~ (almost.equal(.,
DF2[cur_column(), "MEDIAN"], tolerance = 1)),
.names = "dummy{.col}"))
-output
DF1
T01 T02 T03 T04 T05 dummyT01 dummyT02 dummyT03 dummyT04 dummyT05
1 15 20 48 25 5 0 0 0 0 0
2 12 18 35 30 12 1 1 0 1 1
3 13 15 50 60 42 1 0 0 0 0
Or using a for
loop
for(i in seq_along(DF1))
DF1[paste0('dummy', names(DF1)[i])] <- (almost.equal(DF1[[i]],
DF2[names(DF1)[i], "MEDIAN"], tolerance = 1))
data
DF1 <- structure(list(T01 = c(15L, 12L, 13L), T02 = c(20L, 18L, 15L),
T03 = c(48L, 35L, 50L), T04 = c(25L, 30L, 60L), T05 = c(5L,
12L, 42L)), class = "data.frame", row.names = c("1", "2",
"3"))
DF2 <- structure(list(MEDIAN = c(13L, 18L, 45L, 30L, 12L), SD = c(1.24,
2.05, 6.64, 15.45, 16.04)), class = "data.frame", row.names = c("T01",
"T02", "T03", "T04", "T05"))