Home > Software design >  R - create TRUE/FALSE column if dates are within a certain timeframe
R - create TRUE/FALSE column if dates are within a certain timeframe

Time:08-13

I am working with a dataframe in which two of my columns are dates. I'd like to add a new column, let's call it RANGE, that returns either TRUE or FALSE if the two column's dates are within /- 15 days of each other.

My reproducible input:

df <- structure(list(i = 2:7, j = c(1L, 1L, 1L, 1L, 1L, 1L), distance = structure(c(5455.32297802238, 
5424.40832625047, 4430.5045023107, 6247.9601523913, 4064.97434408376, 
5910.5574526888), units = structure(list(numerator = "m", denominator = character(0)), class = "symbolic_units"), class = "units"), 
    HD_YEAR.i = c("2019", "2019", "2019", "2019", "2019", "2019"
    ), HD_YEAR.j = c("2019", "2019", "2019", "2019", "2019", 
    "2019"), TERRITORY.i = c("DISCGO", "SCHOLA", "COXFER", "LOWESP", 
    "SEACOA", "HGTCCC"), TERRITORY.j = c("RIVERF", "RIVERF", 
    "RIVERF", "RIVERF", "RIVERF", "RIVERF"), HATCH_DATE.i = structure(c(17989, 
    17992, 17982, 18010, 18005, 18017), class = "Date"), HATCH_DATE.j = structure(c(17981, 
    17981, 17981, 17981, 17981, 17981), class = "Date")), row.names = 2:7, class = "data.frame")

If done correctly, the new column should read:

|RANGE|
|-----|
|TRUE |
|TRUE |
|TRUE |
|FALSE|
|FALSE|
|FALSE|

I realize this is a simple task, but I am still learning. Then I will go on to filter out the FALSE but that I can do.

CodePudding user response:

You can use difftime(), like this:

df$RANGE = abs(difftime(df$HATCH_DATE.i,df$HATCH_DATE.j, unit="day"))<15

OR

library(dplyr)
df <- df %>% 
  mutate(RANGE = abs(difftime(HATCH_DATE.i,HATCH_DATE.j, unit="day"))<15)

Output:

  i j distance HD_YEAR.i HD_YEAR.j TERRITORY.i TERRITORY.j HATCH_DATE.i HATCH_DATE.j RANGE
2 2 1 5455.323      2019      2019      DISCGO      RIVERF   2019-04-03   2019-03-26  TRUE
3 3 1 5424.408      2019      2019      SCHOLA      RIVERF   2019-04-06   2019-03-26  TRUE
4 4 1 4430.505      2019      2019      COXFER      RIVERF   2019-03-27   2019-03-26  TRUE
5 5 1 6247.960      2019      2019      LOWESP      RIVERF   2019-04-24   2019-03-26 FALSE
6 6 1 4064.974      2019      2019      SEACOA      RIVERF   2019-04-19   2019-03-26 FALSE
7 7 1 5910.557      2019      2019      HGTCCC      RIVERF   2019-05-01   2019-03-26 FALSE

CodePudding user response:

You could use the lubridate package, e.g.

library(dplyr)
library(lubridate)
df %>% 
  mutate(RANGE = ifelse(abs(lubridate::ymd(HATCH_DATE.i)- lubridate::ymd(HATCH_DATE.j)) <15, TRUE, FALSE)) %>% 
  filter(RANGE != FALSE)

Output:

i j distance HD_YEAR.i HD_YEAR.j TERRITORY.i TERRITORY.j HATCH_DATE.i HATCH_DATE.j RANGE
1 2 1 5455.323      2019      2019      DISCGO      RIVERF   2019-04-03   2019-03-26  TRUE
2 3 1 5424.408      2019      2019      SCHOLA      RIVERF   2019-04-06   2019-03-26  TRUE
3 4 1 4430.505      2019      2019      COXFER      RIVERF   2019-03-27   2019-03-26  TRUE
  • Related