I have the following data example:
df <- structure(list(cycle = structure(c(1606782894, 1606786502, 1606790113,
1606793721, 1606800941, 1606800941, 1606804550, 1606808160, 1606808160,
1606845846), tzone = "UTC", class = c("POSIXct", "POSIXt")),
N = c(0L, 0L, 0L, 0L, 0L, 3L, 0L, 0L, 1L, 0L)), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -10L), groups = structure(list(
cycle = structure(c(1606782894, 1606786502, 1606790113, 1606793721,
1606797332, 1606800941, 1606804550, 1606808160, 1606845846
), tzone = "UTC", class = c("POSIXct", "POSIXt")), .rows = structure(list(
1L, 2L, 3L, 4L, 5L, 6L, 7L, 8:9, 10L), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -9L), .drop = TRUE))
which looks like
cycle N
<dttm> <int>
1 2020-12-01 00:34:54 0
2 2020-12-01 01:35:02 0
3 2020-12-01 02:35:13 0
4 2020-12-01 03:35:21 0
5 2020-12-01 05:35:41 0
6 2020-12-01 05:35:41 3
7 2020-12-01 06:35:50 0
8 2020-12-01 07:36:00 0
9 2020-12-01 07:36:00 1
10 2020-12-01 18:04:06 0
Lines 5-6 and 8-9 are the same for the cycle
column. I would remove the duplicated lines where column N
are 0.
I appreciate any help
Thanks
CodePudding user response:
As it is already grouped by 'cycle', we can use filter
as
library(dplyr)
df1 %>%
group_by(cycle) %>%
filter(!(n() > 1 & N == 0)) %>%
ungroup
-output
# A tibble: 8 × 2
cycle N
<dttm> <int>
1 2020-12-01 00:34:54 0
2 2020-12-01 01:35:02 0
3 2020-12-01 02:35:13 0
4 2020-12-01 03:35:21 0
5 2020-12-01 05:35:41 3
6 2020-12-01 06:35:50 0
7 2020-12-01 07:36:00 1
8 2020-12-01 18:04:06 0
Or using duplicated
after arrange
ing the data
df1 %>%
ungroup %>%
arrange(cycle, N == 0) %>%
filter(!(duplicated(cycle) & N == 0))
-output
# A tibble: 8 × 2
cycle N
<dttm> <int>
1 2020-12-01 00:34:54 0
2 2020-12-01 01:35:02 0
3 2020-12-01 02:35:13 0
4 2020-12-01 03:35:21 0
5 2020-12-01 05:35:41 3
6 2020-12-01 06:35:50 0
7 2020-12-01 07:36:00 1
8 2020-12-01 18:04:06 0
CodePudding user response:
A base R code-golfing, with subset
ave
> subset(df, N | !ave(N, cycle))
# A tibble: 8 × 2
# Groups: cycle [8]
cycle N
<dttm> <int>
1 2020-12-01 00:34:54 0
2 2020-12-01 01:35:02 0
3 2020-12-01 02:35:13 0
4 2020-12-01 03:35:21 0
5 2020-12-01 05:35:41 3
6 2020-12-01 06:35:50 0
7 2020-12-01 07:36:00 1
8 2020-12-01 18:04:06 0