I would like to be able to keep NAs for only groups that have more than two entries and just want to leave alone any groups that have 1 entry (regardless if they have NAs or not). That is, if the group has two elements, keep only the NA. If it has one then just take whatever is there. Here is a reprex of the type of data I have:
library(dplyr)
data <- data.frame(
x = c(1, NA_real_, 3, NA_real_),
y = c("grp1", "grp2", "grp2", "grp3")
)
data
#> x y
#> 1 1 grp1
#> 2 NA grp2
#> 3 3 grp2
#> 4 NA grp3
Then here is the fairly ugly way I have achieved what I want:
raw <- data %>%
group_by(y) %>%
mutate(n = n())
results <- bind_rows(
raw %>%
filter(n == 2) %>%
filter(is.na(x)),
raw %>%
filter(n == 1)
) %>%
ungroup() %>%
select(-n)
results
#> # A tibble: 3 × 2
#> x y
#> <dbl> <chr>
#> 1 NA grp2
#> 2 1 grp1
#> 3 NA grp3
CodePudding user response:
We could define filter
with the max(row_number()
:
update: instead of max(row_number()
we could use n()
(many thanks to @Juan C)
library(dplyr)
data %>%
group_by(y) %>%
filter(!(is.na(x) & n() == 1))
# filter(!(is.na(x) & max(row_number()) == 1))
x y
<dbl> <chr>
1 1 grp1
2 NA grp2
3 3 grp2
CodePudding user response:
Here's another approach that works
data %>%
group_by(y) %>%
filter(!(!is.na(x) & n() > 1))
# A tibble: 3 × 2
# Groups: y [3]
x y
<dbl> <chr>
1 1 grp1
2 NA grp2
3 NA grp3
CodePudding user response:
You specifically asked to do using dplyr
, however, you can do this using data.table
. It is more intuitive to follow:
library(data.table)
setDT(data)
data <- data[, N := .N, by = y][N <= 1 | !is.na(x)][, N := NULL]
data
Output:
x y
1: 1 grp1
2: 3 grp2
3: NA grp3
If the objective is to retain NAs cases if group has more than one observations, then follow:
data <- data[, N := .N, by = y][N <= 1 | is.na(x)][, N := NULL]
data
Output:
x y
1: 1 grp1
2: NA grp2
3: NA grp3