I have a data frame which includes: one column having individual ID's (unique), and a second column showing a common unique variable. That is, everyone in column 1 took the same action, which is shown in column B.
I'd like to write code in R which creates new rows, which pair everyone in column A based on column B.
that is, given this example:
person<-c("a", "b", "c", "d", "e", "f")
action<-c("x", "x", "x", "y", "y", "y")
data.frame(person, action)
I'd want to create this:
person1<-c("a", "a","b", "d", "d", "e")
person2<-c("b", "c", "c", "e", "f","f")
data.frame(person1, person2)
CodePudding user response:
A method using group_modify()
and combn()
:
library(dplyr)
df %>%
group_by(action) %>%
group_modify(~ as_tibble(t(combn(pull(.x, person), 2))))
# A tibble: 6 × 3
# Groups: action [2]
action V1 V2
<chr> <chr> <chr>
1 x a b
2 x a c
3 x b c
4 y d e
5 y d f
6 y e f
CodePudding user response:
How about this:
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(tidyr)
#> Warning: package 'tidyr' was built under R version 4.1.2
person<-c("a", "b", "c", "d", "e", "f")
action<-c("x", "x", "x", "y", "y", "y")
dat <- data.frame(person, action)
dat %>%
group_by(action) %>%
summarise(person = as.data.frame(t(combn(person, 2)))) %>%
unnest(person) %>%
rename(person1=V1, person2=V2)
#> `summarise()` has grouped output by 'action'. You can override using the
#> `.groups` argument.
#> # A tibble: 6 × 3
#> # Groups: action [2]
#> action person1 person2
#> <chr> <chr> <chr>
#> 1 x a b
#> 2 x a c
#> 3 x b c
#> 4 y d e
#> 5 y d f
#> 6 y e f
Created on 2022-04-21 by the reprex package (v2.0.1)
CodePudding user response:
Here is a one liner in base R.
person <- c("a", "b", "c", "d", "e", "f")
action <- c("x", "x", "x", "y", "y", "y")
df <- data.frame(person, action)
setNames(
do.call(
rbind,
lapply(split(df, df$action),
function(x) as.data.frame(t(combn(x$person, 2))))),
c("person1", "person2"))
# person1 person2
# x.1 a b
# x.2 a c
# x.3 b c
# y.1 d e
# y.2 d f
# y.3 e f
CodePudding user response:
Using base R
subset(merge(dat, dat, by = 'action'), person.x != person.y &
duplicated(paste(pmin(person.x, person.y), pmax(person.x, person.y))))
action person.x person.y
4 x b a
7 x c a
8 x c b
13 y e d
16 y f d
17 y f e