Duplicate rows based on common value in other column?


I have a data frame which includes: one column having individual ID's (unique), and a second column showing a common unique variable. That is, everyone in column 1 took the same action, which is shown in column B.

I'd like to write code in R which creates new rows, which pair everyone in column A based on column B.

that is, given this example:

person<-c("a", "b", "c", "d", "e", "f") 
action<-c("x", "x", "x", "y", "y", "y") 
data.frame(person, action)

I'd want to create this:

person1<-c("a", "a","b", "d", "d", "e") 
person2<-c("b", "c", "c", "e", "f","f")
data.frame(person1, person2)

CodePudding user response:

A method using group_modify() and combn():


df %>%
  group_by(action) %>%
  group_modify(~ as_tibble(t(combn(pull(.x, person), 2))))

# A tibble: 6 × 3
# Groups:   action [2]
  action V1    V2   
  <chr>  <chr> <chr>
1 x      a     b    
2 x      a     c    
3 x      b     c    
4 y      d     e    
5 y      d     f    
6 y      e     f   

CodePudding user response:

How about this:

person<-c("a", "b", "c", "d", "e", "f") 
action<-c("x", "x", "x", "y", "y", "y") 
dat <- data.frame(person, action)

dat %>% 
  group_by(action) %>% 
  summarise(person = as.data.frame(t(combn(person, 2)))) %>% 
  unnest(person) %>% 
  rename(person1=V1, person2=V2)
#> `summarise()` has grouped output by 'action'. You can override using the
#> `.groups` argument.
#> # A tibble: 6 × 3
#> # Groups:   action [2]
#>   action person1 person2
#>   <chr>  <chr>   <chr>  
#> 1 x      a       b      
#> 2 x      a       c      
#> 3 x      b       c      
#> 4 y      d       e      
#> 5 y      d       f      
#> 6 y      e       f

CodePudding user response:

Here is a one liner in base R.

person <- c("a", "b", "c", "d", "e", "f") 
action <- c("x", "x", "x", "y", "y", "y") 
df <- data.frame(person, action)

    lapply(split(df, df$action),
           function(x) as.data.frame(t(combn(x$person, 2))))),
  c("person1", "person2"))

#     person1 person2
# x.1       a       b
# x.2       a       c
# x.3       b       c
# y.1       d       e
# y.2       d       f
# y.3       e       f

CodePudding user response:

Using base R

subset(merge(dat, dat, by = 'action'), person.x != person.y & 
  duplicated(paste(pmin(person.x, person.y), pmax(person.x, person.y))))
   action person.x person.y
4       x        b        a
7       x        c        a
8       x        c        b
13      y        e        d
16      y        f        d
17      y        f        e
