I have data frame in R, which looks as follows:
quotedate | days |
---|---|
2002-01-01 | 12 |
2002-01-01 | 22 |
2002-01-01 | 32 |
2002-01-01 | 42 |
2002-01-03 | 2 |
2002-01-03 | 5 |
2002-01-03 | 12 |
2002-01-03 | 21 |
2002-01-05 | 12 |
2002-01-05 | 212 |
2002-01-05 | 32 |
2002-01-05 | 142 |
2002-01-07 | 112 |
2002-01-07 | 122 |
2002-01-07 | 132 |
2002-01-07 | 142 |
How can I exclude days, which have not 30 days within their days' range.
In the above exapmle I want to keep 2002-01-01 and 2002-01-05.
Do you have any suggestions?
Thank you!
CodePudding user response:
You can use group_by
and filter
:
library(tidyverse)
df %>%
group_by(quotedate) %>%
filter(max(days) >= 30)
quotedate days
1 2002-01-01 12
2 2002-01-01 22
3 2002-01-01 32
4 2002-01-01 42
5 2002-01-05 12
6 2002-01-05 212
7 2002-01-05 32
8 2002-01-05 142
9 2002-01-07 112
10 2002-01-07 122
11 2002-01-07 132
12 2002-01-07 142
CodePudding user response:
Creating a boolean subset using max
in ave
.
dat[with(dat, ave(days, quotedate, FUN=max) >= 30), ]
# quotedate days
# 1 2002-01-01 12
# 2 2002-01-01 22
# 3 2002-01-01 32
# 4 2002-01-01 42
# 9 2002-01-05 12
# 10 2002-01-05 212
# 11 2002-01-05 32
# 12 2002-01-05 142
# 13 2002-01-07 112
# 14 2002-01-07 122
# 15 2002-01-07 132
# 16 2002-01-07 142
Data:
dat <- structure(list(quotedate = c("2002-01-01", "2002-01-01", "2002-01-01",
"2002-01-01", "2002-01-03", "2002-01-03", "2002-01-03", "2002-01-03",
"2002-01-05", "2002-01-05", "2002-01-05", "2002-01-05", "2002-01-07",
"2002-01-07", "2002-01-07", "2002-01-07"), days = c(12L, 22L,
32L, 42L, 2L, 5L, 12L, 21L, 12L, 212L, 32L, 142L, 112L, 122L,
132L, 142L)), class = "data.frame", row.names = c(NA, -16L))