I'm very novice in r or any coding language. I conducted an experiment where I tested the effect of copper on irrigation dripper blockage. I had dripper types "copper" and "normal" and once a week I measured whether they are working (1) or not (0). All the drippers started working (1) and all of them were blocked (0) by the end of the experiment. I also suspect that some type of debris might block the drippers based on their proximity to the water source so the position (lateral, position) columns are important. A sample of the data:
structure(list(Date = structure(c(1660089600, 1660089600, 1660521600,
1660521600, 1660780800, 1660780800, 1661385600, 1661385600, 1661904000,
1661904000, 1662249600, 1662249600, 1662336000, 1662336000), tzone =
"UTC", class = c("POSIXct",
"POSIXt")), Lateral = structure(c(2L, 4L, 2L, 4L, 2L, 4L, 2L,
4L, 2L, 4L, 2L, 4L, 2L, 4L), levels = c("1", "2", "3", "4"), class =
"factor"),
Position = structure(c(1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L,
2L, 1L, 2L, 1L, 2L), levels = c("1", "2", "3", "4", "5",
"6"), class = "factor"), `Distance (m)` = c(0.9, 1.8, 0.9,
1.8, 0.9, 1.8, 0.9, 1.8, 0.9, 1.8, 0.9, 1.8, 0.9, 1.8), Type =
structure(c(2L,
1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L), levels =
c("Copper",
"Normal"), class = "factor"), Working = c(1L, 1L, 1L, 1L,
1L, 1L, 1L, 0L, 1L, 1L, 0L, 0L, 0L, 0L), id = c(6L, 18L,
6L, 18L, 6L, 18L, 6L, 18L, 6L, 18L, 6L, 18L, 6L, 18L)), row.names =
c(NA,
-14L), class = c("tbl_df", "tbl", "data.frame"))
I need to calculate how much time (in days) it took a specific dripper to become blocked (switch from 1 to 0) and remain blocked to the end of the experiment. If any dripper was blocked but the following measurement it worked again then ignore the first 0 (perhaps flag this dripper). I figured that I need to multiply the working column by the time difference from the start for each dripper and if it equals zero keep the previous date value otherwise keep the current value. However I don't understand how to write it. Any help will be much appreciated!
CodePudding user response:
I've made up an example dataset because you didn't provide a reproducible example, and I'm using the {tidyverse} package and style of code.
I'm assuming you have multiple drippers, and each one has its own id
variable, and I'm also assuming each dripper gets measured once a day, so you can calculate the time it worked by subtracting the last date it worked from the first date.
Essentially, you find the appearance of the first two consecutive days the dripper didn't work, discard all data from then on for that particular dripper, and compute the difference between the last and first dates the dripper worked:
library(tidyverse)
# dripper 'a' is the simple case, it works for 4 days and stops after;
# dripper 'b' works for 3 days, stops on day 4, and continues for another 2 after, then stops for good
drippers_example <- tibble(
id = rep(c("a","b"), each = 10),
date = seq.Date(from = as.Date("2022-08-18"), by = 1, length.out = 20),
working = c(rep(1,4),rep(0,6), c(1,1,1,0,1,1,0,0,0,0))
)
# here you compute when the drippers stopped working
drippers_processed <-
drippers_example %>%
group_by(id) %>% # process each dripper separately
mutate(
nonworking_consecutive_days =
# TRUE when current day is non-working and previous day was non-working also
working==0 & lag(as.integer(working), n = 1, default = 1)==0,
# set to TRUE after first consecutive non-working days
stopped_working = cumsum(nonworking_consecutive_days) > 0
) %>%
ungroup
dripper_summary <-
drippers_processed %>%
group_by(id) %>% # summarise each dripper separately
filter(!stopped_working) %>% # keep only entries before they stopped working
summarise(
# compute difference between the end date and the start date - that's how long it worked for
days_worked = as.numeric(max(date)-min(date), units = "days"),
.groups = "drop"
)
If you check dripper_summary
you should get the following, as expected from the example data:
> dripper_summary
# A tibble: 2 x 2
id days_worked
<chr> <dbl>
1 a 4
2 b 6