Create ID variable per chain of values-CodePudding

I have a dataset that looks like this:

data <- data.frame(Name1 = c("A", "B", "D", "E", "H"),
                   Name2 = c("B", "C", "E", "G", "I"))

I would like to add an ID column to help me trace groups of names, i.e. who references who? So with the example data, the groups would be:

  Name1 Name2 GroupID
      A     B       1
      B     C       1
      D     E       2
      E     G       2
      H     I       3

Please note that my original data is not ordered as this example is. Thanks in advance for any help!

CodePudding user response：

You can use the igraph package to make a network from your data set and determine clusters:

data <- data.frame(Name1 = c("A", "B", "D", "E", "H"),
                   Name2 = c("B", "C", "E", "G", "I"))


library(igraph)
graph <- graph_from_data_frame(data, directed = FALSE)
clusters <- components(graph)

#data$GroupId <- sapply(data$Name1, function(x) clusters$membership[which(names(clusters$membership) == x)])
# Simpler version
data$GroupId <- clusters$membership[data$Name1]

That gives:

> data
  Name1 Name2 GroupId
1     A     B       1
2     B     C       1
3     D     E       2
4     E     G       2
5     H     I       3