I am working with a dataframe in which two of my columns are dates. I'd like to add a new column, let's call it RANGE
, that returns either TRUE
or FALSE
if the two column's dates are within /- 15 days of each other.
My reproducible input:
df <- structure(list(i = 2:7, j = c(1L, 1L, 1L, 1L, 1L, 1L), distance = structure(c(5455.32297802238,
5424.40832625047, 4430.5045023107, 6247.9601523913, 4064.97434408376,
5910.5574526888), units = structure(list(numerator = "m", denominator = character(0)), class = "symbolic_units"), class = "units"),
HD_YEAR.i = c("2019", "2019", "2019", "2019", "2019", "2019"
), HD_YEAR.j = c("2019", "2019", "2019", "2019", "2019",
"2019"), TERRITORY.i = c("DISCGO", "SCHOLA", "COXFER", "LOWESP",
"SEACOA", "HGTCCC"), TERRITORY.j = c("RIVERF", "RIVERF",
"RIVERF", "RIVERF", "RIVERF", "RIVERF"), HATCH_DATE.i = structure(c(17989,
17992, 17982, 18010, 18005, 18017), class = "Date"), HATCH_DATE.j = structure(c(17981,
17981, 17981, 17981, 17981, 17981), class = "Date")), row.names = 2:7, class = "data.frame")
If done correctly, the new column should read:
|RANGE|
|-----|
|TRUE |
|TRUE |
|TRUE |
|FALSE|
|FALSE|
|FALSE|
I realize this is a simple task, but I am still learning. Then I will go on to filter out the FALSE
but that I can do.
CodePudding user response:
You can use difftime()
, like this:
df$RANGE = abs(difftime(df$HATCH_DATE.i,df$HATCH_DATE.j, unit="day"))<15
OR
library(dplyr)
df <- df %>%
mutate(RANGE = abs(difftime(HATCH_DATE.i,HATCH_DATE.j, unit="day"))<15)
Output:
i j distance HD_YEAR.i HD_YEAR.j TERRITORY.i TERRITORY.j HATCH_DATE.i HATCH_DATE.j RANGE
2 2 1 5455.323 2019 2019 DISCGO RIVERF 2019-04-03 2019-03-26 TRUE
3 3 1 5424.408 2019 2019 SCHOLA RIVERF 2019-04-06 2019-03-26 TRUE
4 4 1 4430.505 2019 2019 COXFER RIVERF 2019-03-27 2019-03-26 TRUE
5 5 1 6247.960 2019 2019 LOWESP RIVERF 2019-04-24 2019-03-26 FALSE
6 6 1 4064.974 2019 2019 SEACOA RIVERF 2019-04-19 2019-03-26 FALSE
7 7 1 5910.557 2019 2019 HGTCCC RIVERF 2019-05-01 2019-03-26 FALSE
CodePudding user response:
You could use the lubridate
package, e.g.
library(dplyr)
library(lubridate)
df %>%
mutate(RANGE = ifelse(abs(lubridate::ymd(HATCH_DATE.i)- lubridate::ymd(HATCH_DATE.j)) <15, TRUE, FALSE)) %>%
filter(RANGE != FALSE)
Output:
i j distance HD_YEAR.i HD_YEAR.j TERRITORY.i TERRITORY.j HATCH_DATE.i HATCH_DATE.j RANGE
1 2 1 5455.323 2019 2019 DISCGO RIVERF 2019-04-03 2019-03-26 TRUE
2 3 1 5424.408 2019 2019 SCHOLA RIVERF 2019-04-06 2019-03-26 TRUE
3 4 1 4430.505 2019 2019 COXFER RIVERF 2019-03-27 2019-03-26 TRUE