I have a dataframe that stores adjacency relations. I want to divide numbers into different groups according to this dataframe. The dataframe are as follows:
df = data.frame(from=c(1,1,2,2,2,3,3,3,4,4,4,5,5), to=c(1,3,2,3,4,1,2,3,2,4,5,4,5))
df
from to
1 1 1
2 1 3
3 2 2
4 2 3
5 2 4
6 3 1
7 3 2
8 3 3
9 4 2
10 4 4
11 4 5
12 5 4
13 5 5
In above dataframe, number 1
has links with number 1
and 3
, number 2
has links with number 2
, 3
, 4
, so number 1
can not be in same group with number 3
and number 2
can not be in same group with number 3
and number 4
. In the end, groups can be c(1, 2, 5)
and c(3, 4)
.
I wonder how to program it?
CodePudding user response:
First replace the values of to
with NA
when from
and to
are equal.
df2 <- transform(df, to = replace(to, from == to, NA))
Then recursively bind each row of the data if from
of the latter row has not appeared in to
of the former rows.
Reduce(function(x, y) {
if(y$from %in% x$to) x else rbind(x, y)
}, split(df2, 1:nrow(df2)))
# from to
# 1 1 NA
# 2 1 3
# 3 2 NA
# 4 2 3
# 5 2 4
# 12 5 4
# 13 5 NA
Finally, you could extract unique elements for the both columns to get the two groups.
The overall pipeline should be
df |>
transform(to = replace(to, from == to, NA)) |>
(\(dat) split(dat, 1:nrow(dat)))() |>
Reduce(f = \(x, y) if(y$from %in% x$to) x else rbind(x, y))
CodePudding user response:
The answer of Darren Tsai has solved this problem, but with some flaw.
Following is a very clumsy solution:
df = data.frame(from=c(1,1,2,2,2,3,3,3,4,4,4,5,5), to=c(1,3,2,3,4,1,2,3,2,4,5,4,5))
df.list = lapply(split(df,df$from), function(x){
x$to
})
group.idx = rep(1, length(unique(df$from)))
for (i in seq_along(df.list)) {
df.vec <- df.list[[i]]
curr.group = group.idx[i]
remain.vec = setdiff(df.vec, i)
for (j in remain.vec) {
if(group.idx[j] == curr.group){
group.idx[j] = curr.group 1
}
}
}
group.idx
[1] 1 1 2 2 1