Example: How many days has the longest period of NA update (having NA consecutively)
Date | value |
---|---|
1/2/2020 | NA |
1/3/2020 | NA |
1/4/2020 | 3 |
1/5/2020 | NA |
1/6/2020 | 1 |
1/7/2020 | 3 |
1/8/2020 | 3 |
1/9/2020 | NA |
1/10/2020 | 3 |
->>The result for the longest: 4days (from 1/5/2020 to 1/9/2020 is the longest period)
I had tried using the filter to list NA's date and get stuck...
CodePudding user response:
Here's an efficient approach using the data.table
package.
# input data
df <- data.frame(Date = c("1/2/2020","1/3/2020", "1/4/2020","1/5/2020","1/6/2020",
"1/7/2020","1/8/2020", "1/9/2020","1/10/2020"),
value = c(NA, NA, 3L, NA, 1L, 3L, 3L, NA, 3L))
library(data.table)
# convert Date column from character to date class
setDT(df)[, Date := as.IDate(Date, format="%m/%d/%Y")]
# create a column that tells when the last NA occured
df[ is.na(value), days_since_last_na := Date - shift(Date, type="lag")]
subset(df, days_since_last_na == max(days_since_last_na, na.rm=T))
> Date value days_since_last_na
> 1: 2020-01-09 NA 4
CodePudding user response:
this is your example data df:
df <- structure(list(Date = c("01.02.2020", "01.03.2020", "01.04.2020",
"01.05.2020", "01.06.2020", "01.07.2020", "01.08.2020", "01.09.2020",
"01.10.2020"), value = c(NA, NA, 3L, NA, 1L, 3L, 3L, NA, 3L)), class = "data.frame", row.names = c(NA,
9L))
code suggestion:
library(dplyr)
library(lubridate) ## convenient date handling
df %>%
filter(is.na(value)) %>%
mutate(Date = lubridate::mdy(Date),
from = Date,
to = lead(Date, 1),
duration = to - from
) %>%
filter(!is.na(duration)) %>%
## extract observations of shortest and longest duration:
summarise(across(everything(), ~ c(min(.x), max(.x))))
output:
## Date value from to duration
## 1 2020-01-02 NA 2020-01-02 2020-01-03 1 days
## 2 2020-01-05 NA 2020-01-05 2020-01-09 4 days
CodePudding user response:
Here's an approach, using data.table
setDT(df)[is.na(value)][,diff:=c(0, diff(as.IDate(Date, "%m/%d/%y")))][which.max(diff)]
Similar approach using dplyr
df %>%
filter(is.na(value)) %>%
mutate(diff=c(0,diff(as.Date(Date,"%m/%d/%y")))) %>%
slice_max(diff)
Output:
Date value diff
<chr> <int> <dbl>
1 1/9/2020 NA 4