I have a df with stacked paired (time 1, time 2) observations (subject = id) of variables (v1,v2)
df <- tribble(
~id, ~time, ~v1,~ v2,
1, 1, NA, 7,
2, 1, 3, 7,
3, 1, 2, 6,
1, 2, 4, 5,
2, 2, 3, NA,
3, 2, 7, 6
)
For the paired analysis, I need to drop all ids that have NA in either time. In the example above I would be left with only id "3". How can I achieve this? (dplyr if possible.) Thanks
CodePudding user response:
Use subset
from base R
- get the id
where 'v1' is NA
(id[is.na(v1)]
), create a logical with the original 'id' column (id %in% ..
), negate (!
) to get the ids having no NAs in 'v1'
subset(df, !id %in% id[is.na(v1)])
Or with filter
from dplyr
library(dplyr)
filter(df, !id %in% id[is.na(v1)])
# A tibble: 2 × 3
id time v1
<dbl> <dbl> <dbl>
1 3 1 2
2 3 2 7
CodePudding user response:
Another possible solution:
library(dplyr)
df %>% group_by(id) %>% filter(!any(is.na(v1 v2))) %>% ungroup
#> # A tibble: 2 × 4
#> id time v1 v2
#> <dbl> <dbl> <dbl> <dbl>
#> 1 3 1 2 6
#> 2 3 2 7 6
CodePudding user response:
We can use complete.cases
and all
to return only the groups that contain no NA
s in v1
or v2
.
library(dplyr)
df %>%
group_by(id) %>%
filter(all(complete.cases(v1, v2)))
Output
id time v1 v2
<dbl> <dbl> <dbl> <dbl>
1 3 1 2 6
2 3 2 7 6
If you have a lot more columns that start with v
, then we could use c_across
to specify the columns in starts_with
.
df %>%
group_by(id) %>%
filter(all(complete.cases(c_across(starts_with("v")))))