Home > Back-end >  R: Problem using a for loop to modify existing variables in a data.table; the loop does not affect t
R: Problem using a for loop to modify existing variables in a data.table; the loop does not affect t

Time:03-29

Thanks in advance, and sorry if something is unclear, it's my first time posting here. I am working on something that should be fairly simple, but I cannot seem to find a way of making it work.

The task that I want to complete is the following: I have a dataset with hundreds of variables. I need to recode all of them following the same logic. The logic is the following: if the GIVEN VARIABLE == 0 & a SPECIFIC VARIABLE == 1, the GIVEN VARIABLE must = -1. The SPECIFIC VARIABLE is the same for all of them.

What I have done is the following:

set.seed(123)
data=data.table(a = 0:10, b= 0:10, c = 0:10, d = 1:0)

Here "d" is the SPECIFIC VARIABLE and a:c are the GIVEN VARIABLEs

list_variables <- names(data)  
list_variables_v2 <- list_variables[-c(4)] 

I extracted the names of the variables from the dataset (minus d) and put them on a list, so they can be fed into the loop

data_v1 = copy(d)     

for(i in (list_variables_v2)) {
  data_v1[(i) == 0 & d == 1, (i) := -1]
}

Problematically, when I run the loop nothing happens. Those variables that comply with the condition (e.g. a == 0 & d == 1) are not recoded as -1. Various problems could be happening, but I think I have reduced them to one. Potential problems:

a) The code, even outside the loop, does not work. But this is not true. The following code produces the expected result:

data_v1[a == 0 & d == 1, a := -1]

b) The loop is not working, hence, the variable names are not really sorted and recognized. Nonetheless, if I exclude the (i) == 0 condition, the code does work, implying that the loop works for the right side:

for(i in (list_variables_v2)) {
  data_v1[d == 1, (i) := -1]
}

I think that the root of the problem is that R, in the row filtering side, is not recognizing (i) == 0 as e.g. a == 0. This is quite weird given that R, when dealing with the right side (columns), does recognize that (i) := -1 as e.g. a := -1. Any idea of what might be causing this and, hopefully, how to solve it?

Again, many many thanks, and please let me know if something is unclear or repeated.

CodePudding user response:

A simple correction would be to wrap with get

for(i in (list_variables_v2)) {
  data_v1[get(i) == 0 & d == 1, (i) := -1]
}

-output

> data_v1
        a     b     c     d
    <int> <int> <int> <int>
 1:    -1    -1    -1     1
 2:     1     1     1     0
 3:     2     2     2     1
 4:     3     3     3     0
 5:     4     4     4     1
 6:     5     5     5     0
 7:     6     6     6     1
 8:     7     7     7     0
 9:     8     8     8     1
10:     9     9     9     0
11:    10    10    10     1

> data
        a     b     c     d
    <int> <int> <int> <int>
 1:     0     0     0     1
 2:     1     1     1     0
 3:     2     2     2     1
 4:     3     3     3     0
 5:     4     4     4     1
 6:     5     5     5     0
 7:     6     6     6     1
 8:     7     7     7     0
 9:     8     8     8     1
10:     9     9     9     0
11:    10    10    10     1
  • Related