Home > Blockchain >  Filter dataframe on occurrence of same values across columns AND at column end
Filter dataframe on occurrence of same values across columns AND at column end

Time:02-28

I have a dataframe like this:

df <- data.frame(
  id = 1:19,
  Area_l = c(1,2,0,0,0,2,3,1,2,0,0,0,0,3,4,0,0,0,0),
  Area_r = c(3,2,2,0,0,2,3,1,0,0,0,1,3,3,4,3,0,0,0)
)

I need to filter the dataframe in such a way that all rows are omitted that fulfill two conditions:

(i): Area_l and Area_r are 0

(ii): the paired 0values in Area_l and Area_r are the last values in the columns.

I really have no clue how to implement these two conditions using dplyr. The desired result is this:

df
   id Area_l Area_r
1   1      1      3
2   2      2      2
3   3      0      2
4   4      0      0
5   5      0      0
6   6      2      2
7   7      3      3
8   8      1      1
9   9      2      0
10 10      0      0
11 11      0      0
12 12      0      1
13 13      0      3
14 14      3      3
15 15      4      4
16 16      0      3

Any help?

CodePudding user response:

Reverse the order of the dataframe, filter with a cumany condition, then reverse it back.

library(dplyr)
df %>% 
  map_df(rev) %>% 
  filter(cumany(Area_l   Area_r != 0)) %>% 
  map_df(rev)

output

# A tibble: 16 x 3
      id Area_l Area_r
   <int>  <dbl>  <dbl>
 1     1      1      3
 2     2      2      2
 3     3      0      2
 4     4      0      0
 5     5      0      0
 6     6      2      2
 7     7      3      3
 8     8      1      1
 9     9      2      0
10    10      0      0
11    11      0      0
12    12      0      1
13    13      0      3
14    14      3      3
15    15      4      4
16    16      0      3

CodePudding user response:

We may use rle

library(dplyr)
df %>% 
   filter(!if_all(starts_with("Area"), ~ .x == 0 & 
     inverse.rle(within.list(rle(.x == 0), values[-length(values)] <- FALSE))))

-output

    id Area_l Area_r
1   1      1      3
2   2      2      2
3   3      0      2
4   4      0      0
5   5      0      0
6   6      2      2
7   7      3      3
8   8      1      1
9   9      2      0
10 10      0      0
11 11      0      0
12 12      0      1
13 13      0      3
14 14      3      3
15 15      4      4
16 16      0      3

Or another option is

df %>%
   filter(if_any(starts_with("Area"), 
        ~ row_number() <= max(row_number() * (.x != 0))))

Or another option is revcumsum from spatstat.utils

library(spatstat.utils)
df %>% 
    filter(!if_all(starts_with("Area"), ~ revcumsum(.x != 0) <1))
  • Related