I have data with a grouping variable "id" and some values in "x". If a group has more than one NA
in 'x', e.g. id 2, I need to filter rows at the index of the first NA
.
data <- data.frame(id = c(1,1,1,1,2,2,2,2,2,3,3,3,3,3,4,4,4,4,4),
x = c('no','no','yes','yes','yes','no',NA,NA,NA,'yes','no','no',NA,NA,'yes','yes','yes', 'yes', 'yes'))
data
id x
1 1 no
2 1 no
3 1 yes
4 1 yes
5 2 yes
6 2 no
7 2 <NA> # slice id 2 at this position
8 2 <NA>
9 2 <NA>
10 3 yes
11 3 no
12 3 no
13 3 <NA> # slice id 3 at this position
14 3 <NA>
15 4 yes
16 4 yes
17 4 yes
18 4 yes
19 4 yes
Desired outcome
data
id x
1 1 no
2 1 no
3 1 yes
4 1 yes
5 2 yes
6 2 no
7 2 <NA>
8 3 yes
9 3 no
10 3 no
11 3 <NA>
12 4 yes
13 4 yes
14 4 yes
15 4 yes
19 4 yes
I try it using
library(tidyverse)
Sliced <- data %>% group_by(id) %>%
slice(seq_len(min(which(x==NA)[1], n())))
CodePudding user response:
How about this:
data %>%
group_by(id) %>%
mutate(nas = cumsum(is.na(x))) %>% # counts cumulative NAs in x
filter(nas <= 1) %>% # remove if we've seen more than 1 NA for the id
select(-i) # drop temp variable
Note that you didn't specify what assumptions can be made about the incoming data. For example:
- is it always in some order such that the NAs/non-NA values don't get jumbled:
- will non-NA values followed by NA values ever return to non-NA again?
- do you still want to slice if you've seen just one NA (might be implied, given the previous answer)?
- what if there's just 1 NA and no non-NA values preceding?
CodePudding user response:
Various dplyr
-options:
A filter
and a lag
? (Depending on data structure you might want to group_by(id)
)
library(dplyr)
data |>
filter(!is.na(x) | (is.na(x) & !is.na(lag(x))))
Or with slice
as you suggest yourself:
library(dplyr)
data |>
group_by(id) |>
slice(1:first(which(is.na(x)), default = n())) |>
ungroup()
Or using duplicated
like @ThomasIsCoding:
library(dplyr)
data |>
group_by(id) |>
filter(!(duplicated(x) & is.na(x))) |>
ungroup()
Not forgetting @Hendys: filter(cumsum(is.na(x)) < 2)
Output:
id x
1 1 no
2 1 no
3 1 yes
4 1 yes
5 2 yes
6 2 no
7 2 <NA>
8 3 yes
9 3 no
10 3 no
11 3 <NA>
12 4 yes
13 4 yes
14 4 yes
15 4 yes
16 4 yes
CodePudding user response:
A base R option using subset
ave
duplicated
subset(
data,
!(ave(is.na(x), id, is.na(x), FUN = duplicated) & is.na(x))
)
gives
id x
1 1 no
2 1 no
3 1 yes
4 1 yes
5 2 yes
6 2 no
7 2 <NA>
10 3 yes
11 3 no
12 3 no
13 3 <NA>
15 4 yes
16 4 yes
17 4 yes
18 4 yes
19 4 yes