The data frame
that I'm working with is shown below:
yyyy mm dd hour
2011 5 1 10
2011 5 1 12
2011 5 1 13
2011 5 1 14
2011 5 1 15
2011 5 1 16
2011 5 1 17
2011 5 1 20
2011 5 2 11
2011 5 2 12
2011 5 2 13
2011 5 2 14
2011 5 2 15
2011 5 2 16
2011 5 2 17
2011 5 2 18
2011 5 3 10
[...]
I would like to add a new column that reports the number of consecutive hours, as shown below:
yyyy mm dd hour event
2011 5 1 10 1
2011 5 1 12 6
2011 5 1 13 6
2011 5 1 14 6
2011 5 1 15 6
2011 5 1 16 6
2011 5 1 17 6
2011 5 1 20 1
2011 5 2 11 8
2011 5 2 12 8
2011 5 2 13 8
2011 5 2 14 8
2011 5 2 15 8
2011 5 2 16 8
2011 5 2 17 8
2011 5 2 18 8
2011 5 3 10 1
[...]
The number of consecutive values ([mm], [dd]
) of the hours
parameter is reported by the value in the event
column ([mm] [dd]
).
Any suggestion?
CodePudding user response:
Both base R solutions below are cumsum/ave
solutions. There are 2 solutions following Wimpel's comment to the question.
1.
If events cannot span multiple days.
i <- c(0, abs(diff(df1$hour)) != 1)
ave(cumsum(i), cumsum(i), FUN = length)
# [1] 1 6 6 6 6 6 6 1 8 8 8 8 8 8 8 8 1
2.
If events can span multiple days.
h <- with(df1, ISOdatetime(yyyy, mm, dd, hour, 0L, 0L))
j <- c(0, abs(diff(h)) != 1)
ave(cumsum(j), cumsum(j), FUN = length)
# [1] 1 6 6 6 6 6 6 1 8 8 8 8 8 8 8 8 1
And assign the result of ave
to the new column.
df1$event <- ave(<as above>)
Data
df1 <-
structure(list(yyyy = c(2011L, 2011L, 2011L, 2011L, 2011L, 2011L,
2011L, 2011L, 2011L, 2011L, 2011L, 2011L, 2011L, 2011L, 2011L,
2011L, 2011L), mm = c(5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L,
5L, 5L, 5L, 5L, 5L, 5L, 5L), dd = c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L), hour = c(10L, 12L, 13L,
14L, 15L, 16L, 17L, 20L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L,
10L)), class = "data.frame", row.names = c(NA, -17L))
CodePudding user response:
Here's a dplyr
solution:
df %>%
group_by(yyyy,mm,dd) %>%
mutate(event = ifelse(hour == lead(hour) -1 | hour == lag(hour) 1,1,0)) %>%
mutate(event = ifelse(is.na(event),0,event)) %>%
group_by(yyyy,mm,dd,event) %>%
mutate(event = sum(event)) %>%
mutate(event = ifelse(event == 0, 1, event)) # If removed, the ungrouped values will have event = 0
Output:
# A tibble: 17 x 5
# Groups: yyyy, mm, dd, event [4]
yyyy mm dd hour event
<int> <int> <int> <int> <dbl>
1 2011 5 1 10 1
2 2011 5 1 12 6
3 2011 5 1 13 6
4 2011 5 1 14 6
5 2011 5 1 15 6
6 2011 5 1 16 6
7 2011 5 1 17 6
8 2011 5 1 20 1
9 2011 5 2 11 8
10 2011 5 2 12 8
11 2011 5 2 13 8
12 2011 5 2 14 8
13 2011 5 2 15 8
14 2011 5 2 16 8
15 2011 5 2 17 8
16 2011 5 2 18 8
17 2011 5 3 10 1