I have a question regarding summation of columns with different conditions and would really like some help.
Consider this data table:
animal on | animal off | Time |
---|---|---|
cat | dog | 0 |
dog | cat | 10 |
cat | dog | 30 |
dog | cat | 40 |
cat | dog | 50 |
horse | cat | 60 |
cat | horse | 100 |
dog(END) | cat(END) | 110 |
I'd like to calculate the max and average time that an animal stays on a paddock here. This simple example holds two but in reality there are hundreds! Looking at the Time column, the dog stays on for for a maximum of twenty minutes between row two and three. So max 20 and average 15 minutes (one period of 20 and one period of 10). Alternatively, the cat stays on for a maximum of ten minutes and average of ten minutes (it comes on the paddock for ten minutes each for three times).
So my output would look like this:
animal | Max time | Average Time |
---|---|---|
cat | 10 | 10 |
dog | 20 | 15 |
horse | 40 | 40 |
any help would be appreciated!!
CodePudding user response:
Use diff
, group_by
and summarise
:
df %>%
mutate(time_diff = c(diff(Time), NA)) %>%
group_by(`animal on`) %>%
summarise(
`Max time` = max(time_diff, na.rm = TRUE),
`Average Time` = mean(time_diff, na.rm = TRUE)
)
# A tibble: 2 × 3
`animal on` `Max time` `Average Time`
<chr> <dbl> <dbl>
1 cat 10 10
2 dog 20 15