I am struggling with an issue concerned with nested for loops and calculation with conditions.
Let's say I have a data frame like this:
df = data.frame("a" = c(2, 3, 3, 4),
"b" = c(4, 4, 4, 4),
"c" = c(5, 5, 4, 4),
"d" = c(3, 4, 4, 2))
With this df, I want to compare each element between vectors with a condition: if the absolute difference between two elements is lower than 2 (so difference of 0 and 1), I'd like to accord 1 in a newly created vector while the absolute difference between two elements is >= 2, I'd like to append 0.
For example, for a calculation between the vector "a" and the other vectors "b", "c", "d", I want this result: 0 0 1. The first 0 is accorded based on the difference of 2 between a1 and b1; the second 0 is based on the difference of 3 between a1 and c1; the 1 is based on the difference of a1 and d1. So I tried to make a nested for loop to applicate the same itinerary to the elements in the following rows as well.
So my first trial was like this:
list_all = list(df$a, df$b, df$c, df$d)
v0<-c()
for (i in list_all)
for (j in list_all)
if (i != j) {
if(abs(i-j)<2) {
v0<-c(v0, 1)
} else {
v0<-append(v0, 0)
}} else {
next}
The result is like this :
v0
[1] 0 0 1 0 1 1 0 1 0 1 1 0
But it seems that the calculation has been made only among the first elements but not among the following elements.
So my second trial was like this:
list = list(df$b, df$c, df$d)
v1<-c()
for (i in df$a){
for (j in list){
if(abs(i-j)<2) {
v1<-append(v1, 1)
} else {
v1<-append(v1, 0)
}
}
}
v1
v1
[1] 0 0 1 1 0 1 1 0 1 1 1 1
It seems like the calculations were made between all elements of df$a and ONLY the first elements of the others. So this is not what I needed, either.
When I put df$b instead of list in the nested for loop, the result is even more messy.
v2<-c()
for (i in df$a){
for (j in df$b){
if(abs(i-j)<2) {
v2<-append(v2, 1)
} else {
v2<-append(v2, 0)
}
}
}
v2
[1] 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1
It seems like the calculation has not been made between the corresponding elements (in the same rows), but between all vectors regardless of the place.
Could anyone tell me how to fix the problem? I don't understand why the nested for loop works only for the first elements.
Thank you in advance.
CodePudding user response:
I'm not sure if I understood it all correctly, but how about this?
df = data.frame("a" = c(2, 3, 3, 4),
"b" = c(4, 4, 4, 4),
"c" = c(5, 5, 4, 4),
"d" = c(3, 4, 4, 2))
as.vector(apply(df, 1, \(x) ifelse(abs(x[1] - x[2:4]) < 2, 1, 0)))
#> [1] 0 0 1 1 0 1 1 1 1 1 1 0
CodePudding user response:
I think you're making life unnecessarily complicated for yourself. If I understand you correctly, you can do what you want without nesting loops at all.
The key thing to remember is that R is vectorised by default. That means that R will modify all rows of a vector at the same time. There's no need to loop. So, for example, if a
is a vector with values 1
and 2
and I write a 1
, the result will be a vector with values 2
and 3
.
Applying this logic to your case, you can write:
df$diffB <- ifelse(abs(df$a-df$b) < 2, 1, 0)
df$diffC <- ifelse(abs(df$a-df$c) < 2, 1, 0)
df$diffD <- ifelse(abs(df$a-df$d) < 2, 1, 0)
df
Giving
a b c d diffB diffC diffD
1 2 4 5 3 0 0 1
2 3 4 5 4 1 0 1
3 3 4 4 4 1 1 1
4 4 4 4 2 1 1 0
You can write a loop to loop over columns if you wish, and Aron has given you one option to do this in his answer.
Personally, I find the using tidyverse results in code that's easier to understand than code written in base R. This is because I can read tidyverse code from left to right, whereas base R code (often) needs to be read from inside out. Tidyverse's syntax is more consistent than base R's as well.
Here's how I would solve your problem using the tidyverse:
library(tidyverse)
df %>%
mutate(
diffB=ifelse(abs(a-b) < 2, 1, 0),
diffC=ifelse(abs(a-c) < 2, 1, 0),
diffD=ifelse(abs(a-d) < 2, 1, 0)
)
And the "loop over columns" becomes
df %>%
mutate(
across(
c(b, c, d),
~ifelse(abs(a-.x) < 2, 1, 0),
.names="diff{.col}"
)
)