I have a dataframe like this:
df <- data.frame(
id = 1:19,
Area_l = c(1,2,0,0,0,2,3,1,2,0,0,0,0,3,4,0,0,0,0),
Area_r = c(3,2,2,0,0,2,3,1,0,0,0,1,3,3,4,3,0,0,0)
)
I need to filter the dataframe in such a way that all rows are omitted that fulfill two conditions:
(i): Area_l
and Area_r
are 0
(ii): the paired 0
values in Area_l
and Area_r
are the last values in the columns.
I really have no clue how to implement these two conditions using dplyr
. The desired result is this:
df
id Area_l Area_r
1 1 1 3
2 2 2 2
3 3 0 2
4 4 0 0
5 5 0 0
6 6 2 2
7 7 3 3
8 8 1 1
9 9 2 0
10 10 0 0
11 11 0 0
12 12 0 1
13 13 0 3
14 14 3 3
15 15 4 4
16 16 0 3
Any help?
CodePudding user response:
Reverse the order of the dataframe, filter with a cumany
condition, then reverse it back.
library(dplyr)
df %>%
map_df(rev) %>%
filter(cumany(Area_l Area_r != 0)) %>%
map_df(rev)
output
# A tibble: 16 x 3
id Area_l Area_r
<int> <dbl> <dbl>
1 1 1 3
2 2 2 2
3 3 0 2
4 4 0 0
5 5 0 0
6 6 2 2
7 7 3 3
8 8 1 1
9 9 2 0
10 10 0 0
11 11 0 0
12 12 0 1
13 13 0 3
14 14 3 3
15 15 4 4
16 16 0 3
CodePudding user response:
We may use rle
library(dplyr)
df %>%
filter(!if_all(starts_with("Area"), ~ .x == 0 &
inverse.rle(within.list(rle(.x == 0), values[-length(values)] <- FALSE))))
-output
id Area_l Area_r
1 1 1 3
2 2 2 2
3 3 0 2
4 4 0 0
5 5 0 0
6 6 2 2
7 7 3 3
8 8 1 1
9 9 2 0
10 10 0 0
11 11 0 0
12 12 0 1
13 13 0 3
14 14 3 3
15 15 4 4
16 16 0 3
Or another option is
df %>%
filter(if_any(starts_with("Area"),
~ row_number() <= max(row_number() * (.x != 0))))
Or another option is revcumsum
from spatstat.utils
library(spatstat.utils)
df %>%
filter(!if_all(starts_with("Area"), ~ revcumsum(.x != 0) <1))