I have an issue in R (or Stata) which I cannot solve. I have a database as the sample below:
Year | ID | Firm | Group |
---|---|---|---|
2000 | 1 | 1 | 1 |
2000 | 2 | 1 | 1 |
2000 | 3 | 2 | 2 |
2000 | 4 | 2 | 2 |
2000 | 5 | 2 | 2 |
2000 | 6 | 3 | 3 |
2001 | 1 | 1 | 4 |
2001 | 2 | 1 | 4 |
2001 | 3 | 1 | 4 |
2001 | 4 | 2 | 5 |
2001 | 5 | 2 | 5 |
2001 | 6 | 3 | 6 |
I would need to have a list of interactions of IDs within the same group (i.e.: working at the same firm in a given year) for a network analysis. Something like the following:
Year | ID | Firm | Group | Interactions_A | Interactions_B |
---|---|---|---|---|---|
2000 | 1 | 1 | 1 | 1 | 2 |
2000 | 2 | 1 | 1 | 3 | 4 |
2000 | 3 | 2 | 2 | 3 | 5 |
2000 | 4 | 2 | 2 | 4 | 5 |
2000 | 5 | 2 | 2 | 6 | |
2000 | 6 | 3 | 3 | ||
2001 | 1 | 1 | 4 | 1 | 2 |
2001 | 2 | 1 | 4 | 1 | 3 |
2001 | 3 | 1 | 4 | 2 | 3 |
2001 | 4 | 2 | 5 | 4 | 5 |
2001 | 5 | 2 | 5 | 6 | |
2001 | 6 | 3 | 6 |
Are there any suggestions to do this in R (can also be in Stata)?
Thanks a lot for your kind help!
CodePudding user response:
In R you can first get all the combinations between IDs of each group, and then format the data to get one column for each node.
library(tidyverse)
dat %>%
group_by(Group) %>%
summarise(new = ifelse(n() > 1, paste(combn(ID, 2), collapse = "-"), as.character(ID))) %>%
separate_rows(new, sep = "(?:[^-]*(?:-[^-]*){1})\\K-") %>%
separate(new, into = c("Interactions_A", "Interactions_B"))
Group Interactions_A Interactions_B
<int> <chr> <chr>
1 1 1 2
2 2 3 4
3 2 3 5
4 2 4 5
5 3 6 NA
6 4 1 2
7 4 1 3
8 4 2 3
9 5 4 5
10 6 6 NA
data
dat <- read.table(header= T, text = "Year ID Firm Group
2000 1 1 1
2000 2 1 1
2000 3 2 2
2000 4 2 2
2000 5 2 2
2000 6 3 3
2001 1 1 4
2001 2 1 4
2001 3 1 4
2001 4 2 5
2001 5 2 5
2001 6 3 6")