EDIT: I took one observation out from the data frame of the original post and changed some values so writing manually is easier. I am also adding the desired output, so my question is easier to read.
This is a continuation to a question I made in another post:
How can I stack my dataset so each observation relates to all other observations but itself?
In that post, I asked how can I make a row relate to all other observations but itself. I am trying to apply the answers to my dataset, but the issue is that I have a dataset with country-year-party. In my actual dataset, I want an observation to relate to every other observation within country-year.
Say for example I have a data frame with 2 countries (id1) A and B:
df <- data.frame(id1 = c("A","A","A","B","B","B"),
id2 = c("a", "b", "c", "a", "b", "c" ),
x1 = c(1,2,3,1,2,3))
df
id1 id2 x1
1 A a 1
2 A b 2
3 A c 3
4 B a 1
5 B b 2
6 B c 3
Each row in column id2 identifies one person a, b and c. I want each person to relate to every other person within country. So person a will be related to person b and c, but it has to be within country. I am trying the following codes:
df <- df %>% group_by(id1) %>% merge( df, by = NULL) %>%
filter(id2.x != id2.y)
or even:
df <- df %>% group_by(id2) %>%
left_join(df, df, by = character()) %>%
filter(id2.x != id2.y)
But it leads to the following result:
id1.x id2.x x1.x id1.y id2.y x1.y
1 A b 2 A a 1
2 A c 3 A a 1
3 B b 2 A a 1
4 B c 3 A a 1
5 A a 1 A b 2
6 A c 3 A b 2
7 B a 1 A b 2
8 B c 3 A b 2
9 A a 1 A c 3
10 A b 2 A c 3
11 B a 1 A c 3
12 B b 2 A c 3
13 A b 2 B a 1
14 A c 3 B a 1
15 B b 2 B a 1
16 B c 3 B a 1
17 A a 1 B b 2
18 A c 3 B b 2
19 B a 1 B b 2
20 B c 3 B b 2
21 A a 1 B c 3
22 A b 2 B c 3
23 B a 1 B c 3
24 B b 2 B c 3
Notice that in observation 3, person b in country B is related to person a in country A. This is what I am trying to avoid. I want person a to relate to b and c, but only within each country. How can i do that? The desired output would be something like this:
id1.x id2.x x1.x id1.y id2.y x1.y
1 A a 1 A b 2
2 A a 1 A c 3
3 A b 2 A a 1
4 A b 2 A c 3
5 A c 3 A a 1
6 A c 3 A b 2
7 B a 1 B b 2
8 B a 1 B c 3
9 B b 2 B a 1
10 B b 2 B c 3
11 B c 3 B a 1
12 B c 3 B b 2
So, within each country A and B, each person a,b,c relates to each other but himself. I tried to clarify some questions and simplify my example, let me know if it is clear now and you need more clarification.
CodePudding user response:
Here is a base R option:
df <- data.frame(id1 = c("A","A","A","A","B","B","B","B"),
id2 = c("a", "b", "c", "d", "a", "b", "c", "d"),
x1 = c(1,2,3,4, 5,6,7,8))
#base option
by(df, df$id1, \(x){
rws <- t(combn(seq(nrow(x)), 2))
cbind(x[rws[,1],], x[rws[,2],2:3]) |>
`colnames<-`(c("id1", "id2.x","x1.x", "id2.y", "x2.y"))
}) |>
do.call(what = rbind.data.frame)|>
`row.names<-`(NULL)
#> id1 id2.x x1.x id2.y x2.y
#> 1 A a 1 b 2
#> 2 A a 1 c 3
#> 3 A a 1 d 4
#> 4 A b 2 c 3
#> 5 A b 2 d 4
#> 6 A c 3 d 4
#> 7 B a 5 b 6
#> 8 B a 5 c 7
#> 9 B a 5 d 8
#> 10 B b 6 c 7
#> 11 B b 6 d 8
#> 12 B c 7 d 8
EDIT
here is a tidyverse option
library(tidyverse)
full_join(df, df, by = "id1") |>
filter(id2.x != id2.y)
#> id1 id2.x x1.x id2.y x1.y
#> 1 A a 1 b 2
#> 2 A a 1 c 3
#> 3 A a 1 d 4
#> 4 A b 2 a 1
#> 5 A b 2 c 3
#> 6 A b 2 d 4
#> 7 A c 3 a 1
#> 8 A c 3 b 2
#> 9 A c 3 d 4
#> 10 A d 4 a 1
#> 11 A d 4 b 2
#> 12 A d 4 c 3
#> 13 B a 5 b 6
#> 14 B a 5 c 7
#> 15 B a 5 d 8
#> 16 B b 6 a 5
#> 17 B b 6 c 7
#> 18 B b 6 d 8
#> 19 B c 7 a 5
#> 20 B c 7 b 6
#> 21 B c 7 d 8
#> 22 B d 8 a 5
#> 23 B d 8 b 6
#> 24 B d 8 c 7
CodePudding user response:
Building on @RitchieSacramento’s solution from your previous question, you can use expand_grid()
inside group_modify()
.
library(dplyr)
library(tidyr)
df %>%
group_by(id1) %>%
group_modify(~ expand_grid(.x, .x, .name_repair = make.unique)) %>%
ungroup() %>%
filter(id2 != id2.1)
# A tibble: 12 × 5
id1 id2 x1 id2.1 x1.1
<chr> <chr> <dbl> <chr> <dbl>
1 A a 1 b 2
2 A a 1 c 3
3 A b 2 a 1
4 A b 2 c 3
5 A c 3 a 1
6 A c 3 b 2
7 B a 1 b 2
8 B a 1 c 3
9 B b 2 a 1
10 B b 2 c 3
11 B c 3 a 1
12 B c 3 b 2
CodePudding user response:
df %>%
group_by(id1)%>%
mutate(vals=map(row_number(), ~cur_data_all()[-.x,]))%>%
unnest(vals, names_sep = "_")
# A tibble: 12 × 6
# Groups: id1 [2]
id1 id2 x1 vals_id1 vals_id2 vals_x1
<chr> <chr> <dbl> <chr> <chr> <dbl>
1 A a 1 A b 2
2 A a 1 A c 3
3 A b 2 A a 1
4 A b 2 A c 3
5 A c 3 A a 1
6 A c 3 A b 2
7 B a 1 B b 2
8 B a 1 B c 3
9 B b 2 B a 1
10 B b 2 B c 3
11 B c 3 B a 1
12 B c 3 B b 2