Home > Software engineering >  Comparing partitions from split() using a nested for loop containing an if statement
Comparing partitions from split() using a nested for loop containing an if statement

Time:11-26

Consider the below MWE that splits a distance matrix and attempts to compare partitions:

set.seed(1234) # set random seed for reproducibility

# generate random normal variates
x <- rnorm(5)
y <- rnorm(5)

df <- data.frame(x, y) # merge vectors into dataframe
d <- dist(x) # generate distance matrix

splt <- split(d, 1:5) # split data with 5 values in each partition

# compare partitions

for (i in 1:length(splt)) {
  for (j in 1:length(splt)) {
     if(splt[[i]] != splt[[j]]) {
      a <- length(which(splt[[i]] >= min(splt[[j]]))) / length(splt[[i]])
      b <- length(which(splt[[j]] <= max(splt[[i]]))) / length(splt[[j]])
    }
  }
}
# Error in if (splt[[i]] != splt[[j]]) { : the condition has length > 1

The above for loop should compare all unique partitions (i.e., (1, 2), (1, 3), ... ,(4, 5)). However, the condition is greater than 1.

The result for comparing partition 1 (split[[1]]) and partition 2 (split[[2]]) for instance should be a = b = 1.

a <- length(which(splt[[1]] >= min(splt[[2]]))) / length(splt[[1]])
b <- length(which(splt[[2]] <= max(splt[[1]]))) / length(splt[[2]])

I know the solution is to instead use ifelse() but there is no else within the nested loop.

Any ideas on how to proceed?

CodePudding user response:

Is your problem the error message? That is, why R does not like your comparison splt[[i]] == splt[[j]]? The reason is that we get a vector of comparisons:

> splt[[1]] != splt[[2]]
[1] TRUE TRUE

If I understand you correctly, splt[[i]] is equal to splt[[j]] if all entries are equal and different otherwise. If so, change the comparison to be !(all(splt[[i]] == splt[[j]])). The total loop looks like this:

for (i in 1:length(splt)) {
    for (j in 1:length(splt)) {
        if (!(all(splt[[i]] == splt[[j]]))) {
            a <- length(which(splt[[i]] >= min(splt[[j]]))) / length(splt[[i]])
            b <- length(which(splt[[j]] <= max(splt[[i]]))) / length(splt[[j]])
        }
    }
}
  • Related