Home > Blockchain >  Make a new column based on a condition in already existing column in R
Make a new column based on a condition in already existing column in R

Time:02-05

For each spread_event, I need to make a new column, based on existing leaf_wetness_duration column, that is TRUE is if more than or equal to 3 hours and less than or equal to 5 hours UNINTURRUPTED/CONTINUOUS/CONSECUTIVE dry period is observed.

Here is my data

Spread_event        date            leaf_wetness_duration
1             8/19/15 7:15 PM                  15
1             8/19/15 7:30 PM                  2
1             8/19/15 7:45 PM                  0
1             8/19/15 8:00 PM                  0
1             8/19/15 8:15 PM                  0
1             8/19/15 8:30 PM                  0
1             8/19/15 8:45 PM                  0
1             8/19/15 9:00 PM                  0
1             8/19/15 9:15 PM                  0
1             8/19/15 9:30 PM                  0
1             8/19/15 9:45 PM                  0
1             8/19/15 10:00 PM                 0
1             8/19/15 10:15 PM                 0
1             8/19/15 10:30 PM                 0
1             8/19/15 10:45 PM                 3

In the above example, I can say that UNINTURREPTED leaf wetness duration is 180 minutes or 3 hours (12 rows with zero wetness - each row represents 15 minutes duration, 12*15 = 180 minutes).

Here is a way to make new column that is all FALSE if there are no 20 consecutive rows with leaf_wetness_duration == 0 in each spread_event based on this answer Make a new column based on a condition in existing column in R

library(dplyr)
df %>%
  group_by(Spread_event) %>%
  mutate(
    longest_run_of_0 = with(rle(leaf_wetness_duration), max(lengths[values == 0]) >= 20)
  ) %>%
  ungroup()

# # A tibble: 15 × 4
#    Spread_event date             leaf_wetness_duration longest_run_of_0
#           <int> <chr>                            <int> <lgl>           
#  1            1 8/19/15 7:15 PM                     15 FALSE           
#  2            1 8/19/15 7:30 PM                      2 FALSE           
#  3            1 8/19/15 7:45 PM                      0 FALSE           
#  4            1 8/19/15 8:00 PM                      0 FALSE           
#  5            1 8/19/15 8:15 PM                      0 FALSE           
#  6            1 8/19/15 8:30 PM                      0 FALSE           
#  7            1 8/19/15 8:45 PM                      0 FALSE           
#  8            1 8/19/15 9:00 PM                      0 FALSE           
#  9            1 8/19/15 9:15 PM                      0 FALSE           
# 10            1 8/19/15 9:30 PM                      0 FALSE           
# 11            1 8/19/15 9:45 PM                      0 FALSE           
# 12            1 8/19/15 10:00 PM                     0 FALSE           
# 13            1 8/19/15 10:15 PM                     0 FALSE           
# 14            1 8/19/15 10:30 PM                     0 FALSE           
# 15            1 8/19/15 10:45 PM                     3 FALSE 

I would like to change the condition from 20 consecutive rows with leaf_wetness_duration == 0 to consecutive greater than or equal to 12 and less than or equal to 20 rows. That is, >=12 & <=20. I replaced max(lengths[values == 0] >= 20)) in the above code with max(lengths[values == 0] >=12 & <=20)) but getting an error. I tried replacing max with filter function thinking that filter function would accept & but it didn't work. I would also like to count such run using count = with(rle(leaf_wetness_duration), sum(lengths[values == 0] >= 20)) but the condition needs to be changed to >=12 & <=20.

In summary, I want the same output as answered here Make a new column based on a condition in existing column in R, but I just want the condition to change from 5 hour to between 3 and 5 hour (both inclusive).

CodePudding user response:

Since we don't have enough data to challenge the 3-5 hours constraint, I'm going to demonstrate 1 and 2 hours instead: if there is 1 hour of uninterrupted duration==0 and not 2 hours of uninterrupted ==0. For this, I'll create two columns: one that indicates 1h (your 3h), and one that indicates 2h (your 5h), and from that you can use the combination to infer you desired logic.

library(dplyr)
library(slider) # slide_period_lgl
df %>%
  group_by(Spread_event) %>%
  mutate(
    date = as.POSIXct(date, format = "%m/%d/%y %H:%M %p"),
    run1h = slider::slide_period_lgl(
      .x = leaf_wetness_duration, .i = date,
      .period = "minute", .before = 60,
      .f = ~ all(.x == 0)),
    run2h = slider::slide_period_lgl(
      .x = leaf_wetness_duration, .i = date,
      .period = "minute", .before = 120,
      .f = ~ all(.x == 0)),
    result1 = run1h & !run2h,
    result2 = any(result1)
  ) %>%
  ungroup()
# # A tibble: 15 × 7
#    Spread_event date                leaf_wetness_duration run1h run2h result1 result2
#           <int> <dttm>                              <int> <lgl> <lgl> <lgl>   <lgl>  
#  1            1 2015-08-19 07:15:00                    15 FALSE FALSE FALSE   TRUE   
#  2            1 2015-08-19 07:30:00                     2 FALSE FALSE FALSE   TRUE   
#  3            1 2015-08-19 07:45:00                     0 FALSE FALSE FALSE   TRUE   
#  4            1 2015-08-19 08:00:00                     0 FALSE FALSE FALSE   TRUE   
#  5            1 2015-08-19 08:15:00                     0 FALSE FALSE FALSE   TRUE   
#  6            1 2015-08-19 08:30:00                     0 FALSE FALSE FALSE   TRUE   
#  7            1 2015-08-19 08:45:00                     0 TRUE  FALSE TRUE    TRUE   
#  8            1 2015-08-19 09:00:00                     0 TRUE  FALSE TRUE    TRUE   
#  9            1 2015-08-19 09:15:00                     0 TRUE  FALSE TRUE    TRUE   
# 10            1 2015-08-19 09:30:00                     0 TRUE  FALSE TRUE    TRUE   
# 11            1 2015-08-19 09:45:00                     0 TRUE  TRUE  FALSE   TRUE   
# 12            1 2015-08-19 10:00:00                     0 TRUE  TRUE  FALSE   TRUE   
# 13            1 2015-08-19 10:15:00                     0 TRUE  TRUE  FALSE   TRUE   
# 14            1 2015-08-19 10:30:00                     0 TRUE  TRUE  FALSE   TRUE   
# 15            1 2015-08-19 10:45:00                     3 FALSE FALSE FALSE   TRUE   

CodePudding user response:

The following code was needed to get continuous 3-5 hour leaf wetness duration

df %>%
group_by(spread_event) %>%
dry_period_3_5 = with(rle(leaf_wetness_duration), sum(lengths[values == 0] >= 12 & lengths[values == 0] <= 20))
  • Related