Divide the number into different groups according to the adjacency relationship-CodePudding

I have a dataframe that stores adjacency relations. I want to divide numbers into different groups according to this dataframe. The dataframe are as follows:

df = data.frame(from=c(1,1,2,2,2,3,3,3,4,4,4,5,5), to=c(1,3,2,3,4,1,2,3,2,4,5,4,5))
df
   from to
1     1  1
2     1  3
3     2  2
4     2  3
5     2  4
6     3  1
7     3  2
8     3  3
9     4  2
10    4  4
11    4  5
12    5  4
13    5  5

In above dataframe, number 1 has links with number 1 and 3, number 2 has links with number 2, 3, 4, so number 1 can not be in same group with number 3 and number 2 can not be in same group with number 3 and number 4. In the end, groups can be c(1, 2, 5) and c(3, 4).

I wonder how to program it?

CodePudding user response：

First replace the values of to with NA when from and to are equal.

df2 <- transform(df, to = replace(to, from == to, NA))

Then recursively bind each row of the data if from of the latter row has not appeared in to of the former rows.

Reduce(function(x, y) {
  if(y$from %in% x$to) x else rbind(x, y)
}, split(df2, 1:nrow(df2)))

#    from to
# 1     1 NA
# 2     1  3
# 3     2 NA
# 4     2  3
# 5     2  4
# 12    5  4
# 13    5 NA

Finally, you could extract unique elements for the both columns to get the two groups.

The overall pipeline should be

df |>
  transform(to = replace(to, from == to, NA)) |>
  (\(dat) split(dat, 1:nrow(dat)))() |>
  Reduce(f = \(x, y) if(y$from %in% x$to) x else rbind(x, y))

CodePudding user response：

The answer of Darren Tsai has solved this problem, but with some flaw.

Following is a very clumsy solution:

df = data.frame(from=c(1,1,2,2,2,3,3,3,4,4,4,5,5), to=c(1,3,2,3,4,1,2,3,2,4,5,4,5))
df.list = lapply(split(df,df$from), function(x){
  x$to
})
group.idx = rep(1, length(unique(df$from)))
for (i in seq_along(df.list)) {
  df.vec <- df.list[[i]]
  curr.group = group.idx[i]
  remain.vec = setdiff(df.vec, i)
  for (j in remain.vec) {
    if(group.idx[j] == curr.group){
      group.idx[j] = curr.group   1
    }
  }
}
group.idx
[1] 1 1 2 2 1