I have a data frame like the one shown below. Each pair of ID numbers (e.g. 2891, 2892) corresponds to pair of twins.
ID zyg.x CDsum
1 2891 2 0
2 2892 2 5
3 4000 1 0
4 4001 1 0
5 4006 2 0
6 4007 2 3
I would like to reshape this data frame to make it look like this... Note that the zyg.x (zygosity) value is the same for each twin in the pair.
Twin Pair zyg CDsumTwin1 CDsumTwin2
1 pair1(2891,2892) 2 0 5
2 pair2(4000,4001) 1 0 0
3 pair3(4006,4007) 2 0 3
Any help would be much appreciated.
CodePudding user response:
Data:
df <- read.table(text = " ID zyg.x CDsum
1 2891 2 0
2 2892 2 5
3 4000 1 0
4 4001 1 0
5 4006 2 0
6 4007 2 3")
Arrange by ID and create a variable "twin" to distinguish the two twins in each pair
df<- df %>%
arrange(ID) %>%
mutate(twin = rep(c(1, 2),length.out = n()))
df
ID zyg.x CDsum twin
1 2891 2 0 1
2 2892 2 5 2
3 4000 1 0 1
4 4001 1 0 2
5 4006 2 0 1
6 4007 2 3 2
Split df into two dataframes - for twin1 and twin2
twin1 <- df %>%
filter(twin == 1) %>%
select(-twin) %>%
rename(CDsumTwin1 = CDsum,
ID1 = ID)
twin1
ID1 zyg.x CDsumTwin1
1 2891 2 0
3 4000 1 0
5 4006 2 0
twin2 <- df %>%
filter(twin == 2) %>%
select(-twin) %>%
rename(CDsumTwin2 = CDsum,
ID2 = ID)
twin2
ID2 zyg.x CDsumTwin2
2 2892 2 5
4 4001 1 0
6 4007 2 3
cbind, combine and rearrange columns:
twin1 %>% cbind(twin2 %>% select(-zyg.x)) %>%
mutate(`Twin Pair` = paste0("pair (", ID1, ", ", ID2, ")")) %>%
select(`Twin Pair`, zyg.x, CDsumTwin1, CDsumTwin2)
Twin Pair zyg.x CDsumTwin1 CDsumTwin2
1 pair (2891, 2892) 2 0 5
3 pair (4000, 4001) 1 0 0
5 pair (4006, 4007) 2 0 3
CodePudding user response:
Here is how we could achieve this with dplyr
only:
library(dplyr)
df %>%
mutate(rn = ceiling(row_number()/2)) %>%
group_by(rn) %>%
mutate(Twin_Pair = paste0(ID, collapse = ","),
Twin_Pair = paste0("pair",rn, "(",Twin_Pair, ")")) %>%
mutate(CDsumTwin1 = first(CDsum),
CDsumTwin2 = last(CDsum), .keep="unused") %>%
slice(2) %>%
ungroup() %>%
select(Twin_Pair, zyg=zyg.x, CDsumTwin1, CDsumTwin2)
output:
Twin_Pair zyg CDsumTwin1 CDsumTwin2
<chr> <dbl> <dbl> <dbl>
1 pair1(2891,2892) 2 0 5
2 pair2(4000,4001) 1 0 0
3 pair3(4006,4007) 2 0 3