I have this dataframe:
df1 <- structure(list(ID = c(1, 2, 2, 2, 3, 4, 5, 6, 6, 7, 8, 8, 9,
10), dateA = structure(c(14974, 18628, 18628, 18628, 14882, 16800,
14882, 17835, 17835, 16832, 16556, 16556, 15949, 16801), class = "Date"),
dateB = structure(c(14610, 15340, 15706, 17501, 14730, NA,
14700, 16191, 17106, 16801, 15810, 16436, 14655, 15431), class = "Date"),
dateC = structure(c(18628, 15705, 17500, 18628, 18628, NA,
18628, 17105, 18628, 18628, 16435, 16556, 15706, 18628), class = "Date")), row.names = c(NA,
-14L), class = c("data.table", "data.frame"))
ID dateA dateB dateC
1: 1 2010-12-31 2010-01-01 2021-01-01
2: 2 2021-01-01 2012-01-01 2012-12-31
3: 2 2021-01-01 2013-01-01 2017-11-30
4: 2 2021-01-01 2017-12-01 2021-01-01
5: 3 2010-09-30 2010-05-01 2021-01-01
6: 4 2015-12-31 <NA> <NA>
7: 5 2010-09-30 2010-04-01 2021-01-01
8: 6 2018-10-31 2014-05-01 2016-10-31
9: 6 2018-10-31 2016-11-01 2021-01-01
10: 7 2016-02-01 2016-01-01 2021-01-01
11: 8 2015-05-01 2013-04-15 2014-12-31
12: 8 2015-05-01 2015-01-01 2015-05-01
13: 9 2013-09-01 2010-02-15 2013-01-01
14: 10 2016-01-01 2012-04-01 2021-01-01
I would like to check if dateA is in the interval of dateB and dateC: My code:
library(dplyr)
df1 %>%
mutate(match= ifelse(between(dateA, dateB, dateC), 1, 0))
gives:
Error: Problem with `mutate()` column `match`.
i `match = ifelse(between(dateA, dateB, dateC), 1, 0)`.
x Not yet implemented NAbounds=TRUE for this non-numeric and non-character type
If I remove the row that contains NA
the code works:
df1 %>%
slice(-6) %>%
mutate(match= ifelse(between(dateA, dateB, dateC), 1, 0))
I want to know, can I leave the row with NA
and perform my code?
CodePudding user response:
There is a confusion as to which between
the OP was using because the input object is data.table
and the code used is dplyr
. So, if we assume both packages are loaded, there is a between
function in each of those packages and depending on which package got loaded last, the between
from the previous package will get masked. If use dplyr::between
, it is not vectorized fully and it is documented in the ?dplyr::between
left, right Boundary values (must be scalars).
df1 %>%
rowwise %>%
mutate(match = (dplyr::between(dateA, dateB, dateC))) %>%
ungroup
-output
# A tibble: 14 × 5
ID dateA dateB dateC match
<dbl> <date> <date> <date> <int>
1 1 2010-12-31 2010-01-01 2021-01-01 1
2 2 2021-01-01 2012-01-01 2012-12-31 0
3 2 2021-01-01 2013-01-01 2017-11-30 0
4 2 2021-01-01 2017-12-01 2021-01-01 1
5 3 2010-09-30 2010-05-01 2021-01-01 1
6 4 2015-12-31 NA NA NA
7 5 2010-09-30 2010-04-01 2021-01-01 1
8 6 2018-10-31 2014-05-01 2016-10-31 0
9 6 2018-10-31 2016-11-01 2021-01-01 1
10 7 2016-02-01 2016-01-01 2021-01-01 1
11 8 2015-05-01 2013-04-15 2014-12-31 0
12 8 2015-05-01 2015-01-01 2015-05-01 1
13 9 2013-09-01 2010-02-15 2013-01-01 0
14 10 2016-01-01 2012-04-01 2021-01-01 1
However, it is not the case with ?data.table::between
(based on the error showed in OP's post, it seems that the between
used was from data.table
,
lower - Lower range bound. Either length 1 or same length as x.
upper - Upper range bound. Either length 1 or same length as x.
but class
can be an issue though it says otherwise
x- Any orderable vector, i.e., those with relevant methods for
<=
, such as numeric, character, Date, etc. in case of between and a numeric vector in case of inrange.
Convert to integer/numeric
from Date
class and it should work
df1 %>%
mutate(match = (data.table::between(as.numeric(dateA),
as.numeric(dateB), as.numeric(dateC))))
-output
ID dateA dateB dateC match
1: 1 2010-12-31 2010-01-01 2021-01-01 1
2: 2 2021-01-01 2012-01-01 2012-12-31 0
3: 2 2021-01-01 2013-01-01 2017-11-30 0
4: 2 2021-01-01 2017-12-01 2021-01-01 1
5: 3 2010-09-30 2010-05-01 2021-01-01 1
6: 4 2015-12-31 <NA> <NA> 1
7: 5 2010-09-30 2010-04-01 2021-01-01 1
8: 6 2018-10-31 2014-05-01 2016-10-31 0
9: 6 2018-10-31 2016-11-01 2021-01-01 1
10: 7 2016-02-01 2016-01-01 2021-01-01 1
11: 8 2015-05-01 2013-04-15 2014-12-31 0
12: 8 2015-05-01 2015-01-01 2015-05-01 1
13: 9 2013-09-01 2010-02-15 2013-01-01 0
14: 10 2016-01-01 2012-04-01 2021-01-01 1
By diving deeper, the issue is in the argument NAbounds
which is TRUE
by default. In the OP's data, there is a single NA
element
df1 %>%
mutate(match = data.table::between(dateA, dateB, dateC))
Error: Problem with
mutate()
columnmatch
. ℹmatch = data.table::between(dateA, dateB, dateC)
. ✖ Not yet implemented NAbounds=TRUE for this non-numeric and non-character type Runrlang::last_error()
to see where the error occurred.
We may need to set this to FALSE
df1 %>%
mutate(match = (data.table::between(dateA, dateB, dateC, NAbounds = FALSE)))
ID dateA dateB dateC match
1: 1 2010-12-31 2010-01-01 2021-01-01 1
2: 2 2021-01-01 2012-01-01 2012-12-31 0
3: 2 2021-01-01 2013-01-01 2017-11-30 0
4: 2 2021-01-01 2017-12-01 2021-01-01 1
5: 3 2010-09-30 2010-05-01 2021-01-01 1
6: 4 2015-12-31 <NA> <NA> NA
7: 5 2010-09-30 2010-04-01 2021-01-01 1
8: 6 2018-10-31 2014-05-01 2016-10-31 0
9: 6 2018-10-31 2016-11-01 2021-01-01 1
10: 7 2016-02-01 2016-01-01 2021-01-01 1
11: 8 2015-05-01 2013-04-15 2014-12-31 0
12: 8 2015-05-01 2015-01-01 2015-05-01 1
13: 9 2013-09-01 2010-02-15 2013-01-01 0
14: 10 2016-01-01 2012-04-01 2021-01-01 1
Or may also do a conversion on the NA
with as.Date
df1 %>%
mutate(match = (data.table::between(dateA, dateB, dateC,
NAbounds = as.Date(NA))))
ID dateA dateB dateC match
1: 1 2010-12-31 2010-01-01 2021-01-01 1
2: 2 2021-01-01 2012-01-01 2012-12-31 0
3: 2 2021-01-01 2013-01-01 2017-11-30 0
4: 2 2021-01-01 2017-12-01 2021-01-01 1
5: 3 2010-09-30 2010-05-01 2021-01-01 1
6: 4 2015-12-31 <NA> <NA> NA
7: 5 2010-09-30 2010-04-01 2021-01-01 1
8: 6 2018-10-31 2014-05-01 2016-10-31 0
9: 6 2018-10-31 2016-11-01 2021-01-01 1
10: 7 2016-02-01 2016-01-01 2021-01-01 1
11: 8 2015-05-01 2013-04-15 2014-12-31 0
12: 8 2015-05-01 2015-01-01 2015-05-01 1
13: 9 2013-09-01 2010-02-15 2013-01-01 0
14: 10 2016-01-01 2012-04-01 2021-01-01 1