Combine rows per groups in R (or Stata)-CodePudding

I have an issue in R (or Stata) which I cannot solve. I have a database as the sample below:

Year	ID	Firm	Group
2000	1	1	1
2000	2	1	1
2000	3	2	2
2000	4	2	2
2000	5	2	2
2000	6	3	3

2001	1	1	4
2001	2	1	4
2001	3	1	4
2001	4	2	5
2001	5	2	5
2001	6	3	6

I would need to have a list of interactions of IDs within the same group (i.e.: working at the same firm in a given year) for a network analysis. Something like the following:

Year	ID	Firm	Group	Interactions_A	Interactions_B
2000	1	1	1	1	2
2000	2	1	1	3	4
2000	3	2	2	3	5
2000	4	2	2	4	5
2000	5	2	2	6
2000	6	3	3

2001	1	1	4	1	2
2001	2	1	4	1	3
2001	3	1	4	2	3
2001	4	2	5	4	5
2001	5	2	5	6
2001	6	3	6

Are there any suggestions to do this in R (can also be in Stata)?

Thanks a lot for your kind help!

CodePudding user response：

In R you can first get all the combinations between IDs of each group, and then format the data to get one column for each node.

library(tidyverse)

dat %>% 
  group_by(Group) %>% 
  summarise(new = ifelse(n() > 1, paste(combn(ID, 2), collapse = "-"), as.character(ID))) %>% 
  separate_rows(new, sep = "(?:[^-]*(?:-[^-]*){1})\\K-") %>% 
  separate(new, into = c("Interactions_A", "Interactions_B"))

   Group Interactions_A Interactions_B
   <int> <chr>          <chr>         
 1     1 1              2             
 2     2 3              4             
 3     2 3              5             
 4     2 4              5             
 5     3 6              NA            
 6     4 1              2             
 7     4 1              3             
 8     4 2              3             
 9     5 4              5             
10     6 6              NA

data

dat <- read.table(header= T, text = "Year   ID  Firm    Group
2000    1   1   1
2000    2   1   1
2000    3   2   2
2000    4   2   2
2000    5   2   2
2000    6   3   3
2001    1   1   4
2001    2   1   4
2001    3   1   4
2001    4   2   5
2001    5   2   5
2001    6   3   6")