Home > Software engineering >  How can I stack my dataset so each observation relates to all other observations but itself, within
How can I stack my dataset so each observation relates to all other observations but itself, within

Time:11-24

EDIT: I took one observation out from the data frame of the original post and changed some values so writing manually is easier. I am also adding the desired output, so my question is easier to read.

This is a continuation to a question I made in another post:

How can I stack my dataset so each observation relates to all other observations but itself?

In that post, I asked how can I make a row relate to all other observations but itself. I am trying to apply the answers to my dataset, but the issue is that I have a dataset with country-year-party. In my actual dataset, I want an observation to relate to every other observation within country-year.

Say for example I have a data frame with 2 countries (id1) A and B:

df <- data.frame(id1 = c("A","A","A","B","B","B"),
                 id2 = c("a", "b", "c", "a", "b", "c" ),
                 x1 = c(1,2,3,1,2,3))

df

  id1 id2 x1
1   A   a  1
2   A   b  2
3   A   c  3
4   B   a  1
5   B   b  2
6   B   c  3

Each row in column id2 identifies one person a, b and c. I want each person to relate to every other person within country. So person a will be related to person b and c, but it has to be within country. I am trying the following codes:

df <- df %>% group_by(id1) %>% merge( df, by = NULL) %>% 
  filter(id2.x != id2.y) 

or even:

df <- df %>% group_by(id2) %>% 
  left_join(df, df, by = character()) %>%
  filter(id2.x != id2.y) 

But it leads to the following result:

   id1.x id2.x x1.x id1.y id2.y x1.y
1      A     b    2     A     a    1
2      A     c    3     A     a    1
3      B     b    2     A     a    1
4      B     c    3     A     a    1
5      A     a    1     A     b    2
6      A     c    3     A     b    2
7      B     a    1     A     b    2
8      B     c    3     A     b    2
9      A     a    1     A     c    3
10     A     b    2     A     c    3
11     B     a    1     A     c    3
12     B     b    2     A     c    3
13     A     b    2     B     a    1
14     A     c    3     B     a    1
15     B     b    2     B     a    1
16     B     c    3     B     a    1
17     A     a    1     B     b    2
18     A     c    3     B     b    2
19     B     a    1     B     b    2
20     B     c    3     B     b    2
21     A     a    1     B     c    3
22     A     b    2     B     c    3
23     B     a    1     B     c    3
24     B     b    2     B     c    3

Notice that in observation 3, person b in country B is related to person a in country A. This is what I am trying to avoid. I want person a to relate to b and c, but only within each country. How can i do that? The desired output would be something like this:

   id1.x id2.x x1.x id1.y id2.y x1.y
1      A     a    1     A     b    2
2      A     a    1     A     c    3
3      A     b    2     A     a    1
4      A     b    2     A     c    3
5      A     c    3     A     a    1
6      A     c    3     A     b    2
7      B     a    1     B     b    2
8      B     a    1     B     c    3
9      B     b    2     B     a    1
10     B     b    2     B     c    3
11     B     c    3     B     a    1
12     B     c    3     B     b    2

So, within each country A and B, each person a,b,c relates to each other but himself. I tried to clarify some questions and simplify my example, let me know if it is clear now and you need more clarification.

CodePudding user response:

Here is a base R option:

df <- data.frame(id1 = c("A","A","A","A","B","B","B","B"),
                 id2 = c("a", "b", "c", "d", "a", "b", "c", "d"),
                 x1 = c(1,2,3,4, 5,6,7,8))
#base option
by(df, df$id1, \(x){
  rws <- t(combn(seq(nrow(x)), 2))
  cbind(x[rws[,1],], x[rws[,2],2:3])  |>
    `colnames<-`(c("id1", "id2.x","x1.x", "id2.y", "x2.y")) 
}) |>
  do.call(what = rbind.data.frame)|>
    `row.names<-`(NULL)
#>    id1 id2.x x1.x id2.y x2.y
#> 1    A     a    1     b    2
#> 2    A     a    1     c    3
#> 3    A     a    1     d    4
#> 4    A     b    2     c    3
#> 5    A     b    2     d    4
#> 6    A     c    3     d    4
#> 7    B     a    5     b    6
#> 8    B     a    5     c    7
#> 9    B     a    5     d    8
#> 10   B     b    6     c    7
#> 11   B     b    6     d    8
#> 12   B     c    7     d    8

EDIT

here is a tidyverse option

library(tidyverse)

full_join(df, df, by = "id1") |>
  filter(id2.x != id2.y)
#>    id1 id2.x x1.x id2.y x1.y
#> 1    A     a    1     b    2
#> 2    A     a    1     c    3
#> 3    A     a    1     d    4
#> 4    A     b    2     a    1
#> 5    A     b    2     c    3
#> 6    A     b    2     d    4
#> 7    A     c    3     a    1
#> 8    A     c    3     b    2
#> 9    A     c    3     d    4
#> 10   A     d    4     a    1
#> 11   A     d    4     b    2
#> 12   A     d    4     c    3
#> 13   B     a    5     b    6
#> 14   B     a    5     c    7
#> 15   B     a    5     d    8
#> 16   B     b    6     a    5
#> 17   B     b    6     c    7
#> 18   B     b    6     d    8
#> 19   B     c    7     a    5
#> 20   B     c    7     b    6
#> 21   B     c    7     d    8
#> 22   B     d    8     a    5
#> 23   B     d    8     b    6
#> 24   B     d    8     c    7

CodePudding user response:

Building on @RitchieSacramento’s solution from your previous question, you can use expand_grid() inside group_modify().

library(dplyr)
library(tidyr)

df %>%
  group_by(id1) %>%
  group_modify(~ expand_grid(.x, .x, .name_repair = make.unique)) %>%
  ungroup() %>%
  filter(id2 != id2.1)
# A tibble: 12 × 5
   id1   id2      x1 id2.1  x1.1
   <chr> <chr> <dbl> <chr> <dbl>
 1 A     a         1 b         2
 2 A     a         1 c         3
 3 A     b         2 a         1
 4 A     b         2 c         3
 5 A     c         3 a         1
 6 A     c         3 b         2
 7 B     a         1 b         2
 8 B     a         1 c         3
 9 B     b         2 a         1
10 B     b         2 c         3
11 B     c         3 a         1
12 B     c         3 b         2

CodePudding user response:

df %>%
  group_by(id1)%>%
  mutate(vals=map(row_number(), ~cur_data_all()[-.x,]))%>%
  unnest(vals, names_sep = "_")

# A tibble: 12 × 6
# Groups:   id1 [2]
   id1   id2      x1 vals_id1 vals_id2 vals_x1
   <chr> <chr> <dbl> <chr>    <chr>      <dbl>
 1 A     a         1 A        b              2
 2 A     a         1 A        c              3
 3 A     b         2 A        a              1
 4 A     b         2 A        c              3
 5 A     c         3 A        a              1
 6 A     c         3 A        b              2
 7 B     a         1 B        b              2
 8 B     a         1 B        c              3
 9 B     b         2 B        a              1
10 B     b         2 B        c              3
11 B     c         3 B        a              1
12 B     c         3 B        b              2
  • Related