Home > Mobile >  How to group interconnected elements in R dplyr
How to group interconnected elements in R dplyr

Time:03-16

I have a data frame that looks like this. Elements from the col1 are connected indirectly with elements in col2. for example 1 is connected with 2 and 3. and 2 is connected with 3. Therefore 1 should be connected with 3 as well.

library(tidyverse)

df1 <- tibble(col1=c(1,1,2,5,5,6), 
              col2=c(2,3,3,6,7,7))
df1
#> # A tibble: 6 × 2
#>    col1  col2
#>   <dbl> <dbl>
#> 1     1     2
#> 2     1     3
#> 3     2     3
#> 4     5     6
#> 5     5     7
#> 6     6     7

Created on 2022-03-15 by the reprex package (v2.0.1)

I want my data to look like this

#>    col1  col2  col3
#>   <dbl> <dbl>
#> 1     1     2  group1
#> 2     1     3  group1
#> 3     2     3  group1
#> 4     5     6  group2
#> 5     5     7  group2
#> 6     6     7  group2

I would appreciate any possible help to solve this riddle. Thank you for your time

CodePudding user response:

We may use igraph

library(igraph)
library(dplyr)
library(stringr)
g <- graph.data.frame(df1, directed = TRUE)
df1 %>% 
   mutate(col3 = str_c("group", clusters(g)$membership[as.character(col1)]))

-output

# A tibble: 6 × 3
   col1  col2 col3  
  <dbl> <dbl> <chr> 
1     1     2 group1
2     1     3 group1
3     2     3 group1
4     5     6 group2
5     5     7 group2
6     6     7 group2
  • Related