Home > database >  Removing duplicated in one row where another have certain conditions
Removing duplicated in one row where another have certain conditions

Time:09-14

I have the following data example:

    df <- structure(list(cycle = structure(c(1606782894, 1606786502, 1606790113, 
    1606793721, 1606800941, 1606800941, 1606804550, 1606808160, 1606808160, 
    1606845846), tzone = "UTC", class = c("POSIXct", "POSIXt")), 
        N = c(0L, 0L, 0L, 0L, 0L, 3L, 0L, 0L, 1L, 0L)), class = c("grouped_df", 
    "tbl_df", "tbl", "data.frame"), row.names = c(NA, -10L), groups = structure(list(
        cycle = structure(c(1606782894, 1606786502, 1606790113, 1606793721, 
        1606797332, 1606800941, 1606804550, 1606808160, 1606845846
        ), tzone = "UTC", class = c("POSIXct", "POSIXt")), .rows = structure(list(
            1L, 2L, 3L, 4L, 5L, 6L, 7L, 8:9, 10L), ptype = integer(0), class = c("vctrs_list_of", 
        "vctrs_vctr", "list"))), class = c("tbl_df", "tbl", "data.frame"
    ), row.names = c(NA, -9L), .drop = TRUE))

which looks like

       cycle                   N
       <dttm>              <int>
     1 2020-12-01 00:34:54     0
     2 2020-12-01 01:35:02     0
     3 2020-12-01 02:35:13     0
     4 2020-12-01 03:35:21     0
     5 2020-12-01 05:35:41     0
     6 2020-12-01 05:35:41     3
     7 2020-12-01 06:35:50     0
     8 2020-12-01 07:36:00     0
     9 2020-12-01 07:36:00     1
    10 2020-12-01 18:04:06     0

Lines 5-6 and 8-9 are the same for the cycle column. I would remove the duplicated lines where column N are 0.

I appreciate any help

Thanks

CodePudding user response:

As it is already grouped by 'cycle', we can use filter as

library(dplyr)
df1 %>%
    group_by(cycle) %>%
    filter(!(n() > 1 & N == 0)) %>%
    ungroup

-output

# A tibble: 8 × 2
  cycle                   N
  <dttm>              <int>
1 2020-12-01 00:34:54     0
2 2020-12-01 01:35:02     0
3 2020-12-01 02:35:13     0
4 2020-12-01 03:35:21     0
5 2020-12-01 05:35:41     3
6 2020-12-01 06:35:50     0
7 2020-12-01 07:36:00     1
8 2020-12-01 18:04:06     0

Or using duplicated after arrangeing the data

df1 %>% 
  ungroup %>% 
  arrange(cycle, N == 0) %>%
  filter(!(duplicated(cycle) & N == 0))

-output

# A tibble: 8 × 2
  cycle                   N
  <dttm>              <int>
1 2020-12-01 00:34:54     0
2 2020-12-01 01:35:02     0
3 2020-12-01 02:35:13     0
4 2020-12-01 03:35:21     0
5 2020-12-01 05:35:41     3
6 2020-12-01 06:35:50     0
7 2020-12-01 07:36:00     1
8 2020-12-01 18:04:06     0

CodePudding user response:

A base R code-golfing, with subset ave

> subset(df, N | !ave(N, cycle))
# A tibble: 8 × 2
# Groups:   cycle [8]
  cycle                   N
  <dttm>              <int>
1 2020-12-01 00:34:54     0
2 2020-12-01 01:35:02     0
3 2020-12-01 02:35:13     0
4 2020-12-01 03:35:21     0
5 2020-12-01 05:35:41     3
6 2020-12-01 06:35:50     0
7 2020-12-01 07:36:00     1
8 2020-12-01 18:04:06     0
  • Related