Home > OS >  How can i find the longest/shortest number of days with condition in r?
How can i find the longest/shortest number of days with condition in r?

Time:04-11

Example: How many days has the longest period of NA update (having NA consecutively)

Date value
1/2/2020 NA
1/3/2020 NA
1/4/2020 3
1/5/2020 NA
1/6/2020 1
1/7/2020 3
1/8/2020 3
1/9/2020 NA
1/10/2020 3

->>The result for the longest: 4days (from 1/5/2020 to 1/9/2020 is the longest period)

enter image description here I had tried using the filter to list NA's date and get stuck...

CodePudding user response:

Here's an efficient approach using the data.table package.

# input data
df <-  data.frame(Date = c("1/2/2020","1/3/2020", "1/4/2020","1/5/2020","1/6/2020",
                           "1/7/2020","1/8/2020", "1/9/2020","1/10/2020"),
                  value = c(NA, NA, 3L, NA, 1L, 3L, 3L, NA, 3L))



library(data.table)

# convert Date column from character to date class
setDT(df)[, Date := as.IDate(Date, format="%m/%d/%Y")]

# create a column that tells when the last NA occured
df[ is.na(value),  days_since_last_na := Date  - shift(Date, type="lag")]

subset(df, days_since_last_na == max(days_since_last_na, na.rm=T))

>          Date value days_since_last_na
> 1: 2020-01-09    NA                  4

CodePudding user response:

this is your example data df:

df <- structure(list(Date = c("01.02.2020", "01.03.2020", "01.04.2020", 
"01.05.2020", "01.06.2020", "01.07.2020", "01.08.2020", "01.09.2020", 
"01.10.2020"), value = c(NA, NA, 3L, NA, 1L, 3L, 3L, NA, 3L)), class = "data.frame", row.names = c(NA, 
9L))

code suggestion:

library(dplyr)
library(lubridate) ## convenient date handling

df %>%
  filter(is.na(value)) %>%
  mutate(Date = lubridate::mdy(Date),
         from = Date,
         to = lead(Date, 1),
         duration = to - from
         ) %>%
  filter(!is.na(duration)) %>%
  ## extract observations of shortest and longest duration:
  summarise(across(everything(), ~  c(min(.x), max(.x))))

output:

##         Date value       from         to duration
## 1 2020-01-02    NA 2020-01-02 2020-01-03   1 days
## 2 2020-01-05    NA 2020-01-05 2020-01-09   4 days

CodePudding user response:

Here's an approach, using data.table

setDT(df)[is.na(value)][,diff:=c(0, diff(as.IDate(Date, "%m/%d/%y")))][which.max(diff)]

Similar approach using dplyr

df %>%
  filter(is.na(value)) %>%
  mutate(diff=c(0,diff(as.Date(Date,"%m/%d/%y")))) %>%
  slice_max(diff)

Output:

  Date     value  diff
  <chr>    <int> <dbl>
1 1/9/2020    NA     4
  •  Tags:  
  • r
  • Related