I have this data set, with values for twins within families:
zyg fid x_t1 x_t2 y_t1 y_t2
1 499474 NA 1 1 NA
1 499474 NA NA NA NA
1 499474 NA NA NA 1
1 499474 NA NA NA NA
1 499540 NA NA 1 NA
1 499540 NA NA NA NA
2 499874 NA NA NA NA
2 499874 NA NA 1 NA
2 499874 NA NA NA 1
2 499874 2 NA NA 1
- How do I collapse the families retaining the phenotype information for x and y, when these are present?
The expected for family 499479 is:
zyg fid x_t1 x_t2 y_t1 y_t2
1 499474 NA 1 1 1
and for family 499874, it should be:
2 499874 2 NA 1 1
CodePudding user response:
You can use the following code:
library(dplyr)
df %>%
group_by(fid) %>%
summarise_all(~first(na.omit(.)))
Output:
# A tibble: 3 × 6
fid zyg x_t1 x_t2 y_t1 y_t2
<int> <int> <int> <int> <int> <int>
1 499474 1 NA 1 1 1
2 499540 1 NA NA 1 NA
3 499874 2 2 NA 1 1
Your data:
df<-structure(list(zyg = c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L
), fid = c(499474L, 499474L, 499474L, 499474L, 499540L, 499540L,
499874L, 499874L, 499874L, 499874L), x_t1 = c(NA, NA, NA, NA,
NA, NA, NA, NA, NA, 2L), x_t2 = c(1L, NA, NA, NA, NA, NA, NA,
NA, NA, NA), y_t1 = c(1L, NA, NA, NA, 1L, NA, NA, 1L, NA, NA),
y_t2 = c(NA, NA, 1L, NA, NA, NA, NA, NA, 1L, 1L)), class = "data.frame", row.names = c(NA,
-10L))
CodePudding user response:
If there are only one non-NA element, per group
library(dplyr)
df1 %>%
group_by(zyg, fid) %>%
summarise(across(everything(), ~ .x[complete.cases(.x)][1]), .groups = "drop")
-output
# A tibble: 3 × 6
zyg fid x_t1 x_t2 y_t1 y_t2
<int> <int> <int> <int> <int> <int>
1 1 499474 NA 1 1 1
2 1 499540 NA NA 1 NA
3 2 499874 2 NA 1 1
data
df1 <- structure(list(zyg = c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L
), fid = c(499474L, 499474L, 499474L, 499474L, 499540L, 499540L,
499874L, 499874L, 499874L, 499874L), x_t1 = c(NA, NA, NA, NA,
NA, NA, NA, NA, NA, 2L), x_t2 = c(1L, NA, NA, NA, NA, NA, NA,
NA, NA, NA), y_t1 = c(1L, NA, NA, NA, 1L, NA, NA, 1L, NA, NA),
y_t2 = c(NA, NA, 1L, NA, NA, NA, NA, NA, 1L, 1L)),
class = "data.frame", row.names = c(NA,
-10L))
CodePudding user response:
Another possible solution:
library(dplyr)
df %>%
group_by(zyg, fid) %>%
summarise(across(everything(), ~ if (all(is.na(.x))) {NA} else
{max(.x, na.rm = T)}), .groups = "drop")
#> # A tibble: 3 × 6
#> zyg fid x_t1 x_t2 y_t1 y_t2
#> <int> <int> <int> <int> <int> <int>
#> 1 1 499474 NA 1 1 1
#> 2 1 499540 NA NA 1 NA
#> 3 2 499874 2 NA 1 1
CodePudding user response:
very similiar to the other answers but wanted to give my own solution too.
df %>%
group_by(zyg,fid) %>%
summarise(across(everything(),~sum(.,na.rm=TRUE))
)
CodePudding user response:
You want to do something what coalesce
does by rows for columns:
Here is how you could do it:
libarary(dplyr)
coalesce_by_column <- function(df) {
return(dplyr::coalesce(!!! as.list(df)))
}
df %>%
group_by(fid) %>%
summarise(across(everything(), coalesce_by_column))
fid zyg x_t1 x_t2 y_t1 y_t2
<int> <int> <int> <int> <int> <int>
1 499474 1 NA 1 1 1
2 499540 1 NA NA 1 NA
3 499874 2 2 NA 1 1