Home > Mobile >  Loop to create dummies out of two df R
Loop to create dummies out of two df R

Time:10-22

for easier explanation I'm gonna use a smaller example.

I have two DF:

DF1:  T01  T02  T03  T04  T05
  1   15   20   48   25   5
  2   12   18   35   30   12
  3   13   15   50   60   42

DF2:   MEDIAN  SD
 T01   13      1.24 
 T02   18      2.05
 T03   45      6.64
 T04   30      15.45
 T05   12      16.04

What I want to do is create a loop that adds a dummy to DF1 for each variable, that take value 1 if DF1$T01 ≈ (almost equal) to DF2$MEDIAN[1], and 0 if it's not, and then goes to T02, T03, until it breaks.

Until now, I haven't been able to create a loop (I'm not really good at creating loops tho) that makes this. I did manage to make the dummy for one of the variables (T01), but in the real DF I have over 40 variables, so doing it by hand it´s not efficient at all. What I have right now is:

DF1$dummyt01 <- ifelse(almost.equal(DF1$T01, DF2$MEDIAN[1], tolerance = 2),1,0)

outcome expected:

DF1:  T01  T02  T03  T04  T05 dummyT01 dummyT02 ... dummyT05
  1   15   20   48   25   5   1          1      ...   0
  2   12   18   35   30   12  1          1      ...   1
  3   13   15   50   60   42  1          0      ...   0

Note: Not a native english speaker. Sorry for any mistakes.

EDIT: Expected Outcome.

CodePudding user response:

We may use tidyverse. Loop across the columns of 'DF1', get the column names of that column looped (cur_column()), use that to subset the 'DF2' (as row names) 'MEDIAN' element, do the comparison with almost.equal to return a logical vector, which is coerced to binary with as.integer or . In the .names add the prefix 'dummy' so as to create as new columns

library(dplyr)
library(berryFunctions)
DF1 <- DF1 %>%
    mutate(across(everything(), ~  (almost.equal(.,  
         DF2[cur_column(), "MEDIAN"], tolerance = 1)),
           .names = "dummy{.col}"))

-output

DF1
 T01 T02 T03 T04 T05 dummyT01 dummyT02 dummyT03 dummyT04 dummyT05
1  15  20  48  25   5        0        0        0        0        0
2  12  18  35  30  12        1        1        0        1        1
3  13  15  50  60  42        1        0        0        0        0

Or using a for loop

for(i in seq_along(DF1))
   DF1[paste0('dummy', names(DF1)[i])] <-  (almost.equal(DF1[[i]], 
      DF2[names(DF1)[i], "MEDIAN"], tolerance = 1))

data

DF1 <- structure(list(T01 = c(15L, 12L, 13L), T02 = c(20L, 18L, 15L), 
    T03 = c(48L, 35L, 50L), T04 = c(25L, 30L, 60L), T05 = c(5L, 
    12L, 42L)), class = "data.frame", row.names = c("1", "2", 
"3"))
DF2 <- structure(list(MEDIAN = c(13L, 18L, 45L, 30L, 12L), SD = c(1.24, 
2.05, 6.64, 15.45, 16.04)), class = "data.frame", row.names = c("T01", 
"T02", "T03", "T04", "T05"))
  • Related