Home > database >  Combine rows per groups in R (or Stata)
Combine rows per groups in R (or Stata)

Time:03-19

I have an issue in R (or Stata) which I cannot solve. I have a database as the sample below:

Year ID Firm Group
2000 1 1 1
2000 2 1 1
2000 3 2 2
2000 4 2 2
2000 5 2 2
2000 6 3 3
2001 1 1 4
2001 2 1 4
2001 3 1 4
2001 4 2 5
2001 5 2 5
2001 6 3 6

I would need to have a list of interactions of IDs within the same group (i.e.: working at the same firm in a given year) for a network analysis. Something like the following:

Year ID Firm Group Interactions_A Interactions_B
2000 1 1 1 1 2
2000 2 1 1 3 4
2000 3 2 2 3 5
2000 4 2 2 4 5
2000 5 2 2 6
2000 6 3 3
2001 1 1 4 1 2
2001 2 1 4 1 3
2001 3 1 4 2 3
2001 4 2 5 4 5
2001 5 2 5 6
2001 6 3 6

Are there any suggestions to do this in R (can also be in Stata)?

Thanks a lot for your kind help!

CodePudding user response:

In R you can first get all the combinations between IDs of each group, and then format the data to get one column for each node.

library(tidyverse)

dat %>% 
  group_by(Group) %>% 
  summarise(new = ifelse(n() > 1, paste(combn(ID, 2), collapse = "-"), as.character(ID))) %>% 
  separate_rows(new, sep = "(?:[^-]*(?:-[^-]*){1})\\K-") %>% 
  separate(new, into = c("Interactions_A", "Interactions_B"))

   Group Interactions_A Interactions_B
   <int> <chr>          <chr>         
 1     1 1              2             
 2     2 3              4             
 3     2 3              5             
 4     2 4              5             
 5     3 6              NA            
 6     4 1              2             
 7     4 1              3             
 8     4 2              3             
 9     5 4              5             
10     6 6              NA 

data

dat <- read.table(header= T, text = "Year   ID  Firm    Group
2000    1   1   1
2000    2   1   1
2000    3   2   2
2000    4   2   2
2000    5   2   2
2000    6   3   3
2001    1   1   4
2001    2   1   4
2001    3   1   4
2001    4   2   5
2001    5   2   5
2001    6   3   6")
  • Related