Home > Back-end >  Remove NA by id on stacked paired observations
Remove NA by id on stacked paired observations

Time:04-16

I have a df with stacked paired (time 1, time 2) observations (subject = id) of variables (v1,v2)

df <- tribble(
  ~id, ~time, ~v1,~ v2,
  1, 1, NA, 7,
  2, 1, 3, 7, 
  3, 1, 2, 6,
  1, 2, 4, 5,
  2, 2, 3, NA,
  3, 2, 7, 6
)

For the paired analysis, I need to drop all ids that have NA in either time. In the example above I would be left with only id "3". How can I achieve this? (dplyr if possible.) Thanks

CodePudding user response:

Use subset from base R- get the id where 'v1' is NA (id[is.na(v1)]), create a logical with the original 'id' column (id %in% ..), negate (!) to get the ids having no NAs in 'v1'

subset(df, !id %in% id[is.na(v1)])

Or with filter from dplyr

library(dplyr)
filter(df, !id %in% id[is.na(v1)])
# A tibble: 2 × 3
     id  time    v1
  <dbl> <dbl> <dbl>
1     3     1     2
2     3     2     7

CodePudding user response:

Another possible solution:

library(dplyr)

df %>% group_by(id) %>% filter(!any(is.na(v1 v2))) %>% ungroup

#> # A tibble: 2 × 4
#>      id  time    v1    v2
#>   <dbl> <dbl> <dbl> <dbl>
#> 1     3     1     2     6
#> 2     3     2     7     6

CodePudding user response:

We can use complete.cases and all to return only the groups that contain no NAs in v1 or v2.

library(dplyr)

df %>%
  group_by(id) %>%
  filter(all(complete.cases(v1, v2)))

Output

     id  time    v1    v2
  <dbl> <dbl> <dbl> <dbl>
1     3     1     2     6
2     3     2     7     6

If you have a lot more columns that start with v, then we could use c_across to specify the columns in starts_with.

df %>%
  group_by(id) %>%
  filter(all(complete.cases(c_across(starts_with("v")))))
  • Related