For each spread_event, I need to make a new column, based on existing leaf_wetness_duration
column, that is TRUE is if more than or equal to 3 hours and less than or equal to 5 hours UNINTURRUPTED/CONTINUOUS/CONSECUTIVE
dry period is observed.
Here is my data
Spread_event date leaf_wetness_duration
1 8/19/15 7:15 PM 15
1 8/19/15 7:30 PM 2
1 8/19/15 7:45 PM 0
1 8/19/15 8:00 PM 0
1 8/19/15 8:15 PM 0
1 8/19/15 8:30 PM 0
1 8/19/15 8:45 PM 0
1 8/19/15 9:00 PM 0
1 8/19/15 9:15 PM 0
1 8/19/15 9:30 PM 0
1 8/19/15 9:45 PM 0
1 8/19/15 10:00 PM 0
1 8/19/15 10:15 PM 0
1 8/19/15 10:30 PM 0
1 8/19/15 10:45 PM 3
In the above example, I can say that UNINTURREPTED leaf wetness duration is 180 minutes or 3 hours (12 rows with zero wetness - each row represents 15 minutes duration, 12*15 = 180 minutes).
Here is a way to make new column that is all FALSE if there are no 20 consecutive rows with leaf_wetness_duration == 0 in each spread_event based on this answer Make a new column based on a condition in existing column in R
library(dplyr)
df %>%
group_by(Spread_event) %>%
mutate(
longest_run_of_0 = with(rle(leaf_wetness_duration), max(lengths[values == 0]) >= 20)
) %>%
ungroup()
# # A tibble: 15 × 4
# Spread_event date leaf_wetness_duration longest_run_of_0
# <int> <chr> <int> <lgl>
# 1 1 8/19/15 7:15 PM 15 FALSE
# 2 1 8/19/15 7:30 PM 2 FALSE
# 3 1 8/19/15 7:45 PM 0 FALSE
# 4 1 8/19/15 8:00 PM 0 FALSE
# 5 1 8/19/15 8:15 PM 0 FALSE
# 6 1 8/19/15 8:30 PM 0 FALSE
# 7 1 8/19/15 8:45 PM 0 FALSE
# 8 1 8/19/15 9:00 PM 0 FALSE
# 9 1 8/19/15 9:15 PM 0 FALSE
# 10 1 8/19/15 9:30 PM 0 FALSE
# 11 1 8/19/15 9:45 PM 0 FALSE
# 12 1 8/19/15 10:00 PM 0 FALSE
# 13 1 8/19/15 10:15 PM 0 FALSE
# 14 1 8/19/15 10:30 PM 0 FALSE
# 15 1 8/19/15 10:45 PM 3 FALSE
I would like to change the condition from 20 consecutive rows with leaf_wetness_duration == 0
to consecutive greater than or equal to 12 and less than or equal to 20 rows. That is, >=12 & <=20
. I replaced max(lengths[values == 0] >= 20))
in the above code with max(lengths[values == 0] >=12 & <=20))
but getting an error. I tried replacing max
with filter
function thinking that filter
function would accept &
but it didn't work. I would also like to count such run using count = with(rle(leaf_wetness_duration), sum(lengths[values == 0] >= 20))
but the condition needs to be changed to >=12 & <=20
.
In summary, I want the same output as answered here Make a new column based on a condition in existing column in R, but I just want the condition to change from 5 hour to between 3 and 5 hour (both inclusive).
CodePudding user response:
Since we don't have enough data to challenge the 3-5 hours constraint, I'm going to demonstrate 1 and 2 hours instead: if there is 1 hour of uninterrupted duration==0
and not 2 hours of uninterrupted ==0
. For this, I'll create two columns: one that indicates 1h (your 3h), and one that indicates 2h (your 5h), and from that you can use the combination to infer you desired logic.
library(dplyr)
library(slider) # slide_period_lgl
df %>%
group_by(Spread_event) %>%
mutate(
date = as.POSIXct(date, format = "%m/%d/%y %H:%M %p"),
run1h = slider::slide_period_lgl(
.x = leaf_wetness_duration, .i = date,
.period = "minute", .before = 60,
.f = ~ all(.x == 0)),
run2h = slider::slide_period_lgl(
.x = leaf_wetness_duration, .i = date,
.period = "minute", .before = 120,
.f = ~ all(.x == 0)),
result1 = run1h & !run2h,
result2 = any(result1)
) %>%
ungroup()
# # A tibble: 15 × 7
# Spread_event date leaf_wetness_duration run1h run2h result1 result2
# <int> <dttm> <int> <lgl> <lgl> <lgl> <lgl>
# 1 1 2015-08-19 07:15:00 15 FALSE FALSE FALSE TRUE
# 2 1 2015-08-19 07:30:00 2 FALSE FALSE FALSE TRUE
# 3 1 2015-08-19 07:45:00 0 FALSE FALSE FALSE TRUE
# 4 1 2015-08-19 08:00:00 0 FALSE FALSE FALSE TRUE
# 5 1 2015-08-19 08:15:00 0 FALSE FALSE FALSE TRUE
# 6 1 2015-08-19 08:30:00 0 FALSE FALSE FALSE TRUE
# 7 1 2015-08-19 08:45:00 0 TRUE FALSE TRUE TRUE
# 8 1 2015-08-19 09:00:00 0 TRUE FALSE TRUE TRUE
# 9 1 2015-08-19 09:15:00 0 TRUE FALSE TRUE TRUE
# 10 1 2015-08-19 09:30:00 0 TRUE FALSE TRUE TRUE
# 11 1 2015-08-19 09:45:00 0 TRUE TRUE FALSE TRUE
# 12 1 2015-08-19 10:00:00 0 TRUE TRUE FALSE TRUE
# 13 1 2015-08-19 10:15:00 0 TRUE TRUE FALSE TRUE
# 14 1 2015-08-19 10:30:00 0 TRUE TRUE FALSE TRUE
# 15 1 2015-08-19 10:45:00 3 FALSE FALSE FALSE TRUE
CodePudding user response:
The following code was needed to get continuous 3-5 hour leaf wetness duration
df %>%
group_by(spread_event) %>%
dry_period_3_5 = with(rle(leaf_wetness_duration), sum(lengths[values == 0] >= 12 & lengths[values == 0] <= 20))