I have a large dataset that spans over 20 years. I have a column for the date and another column for the hour ending (HE). I'm trying to add a new column to provide the hour by hour (hrxhr) information in a given year (so running total). So date: Jan 1, 2023, HE: 1 should be hrxhr: 1 and Dec 31, 2023, HE: 24, should be hrxhr:8760 (8784 on leap years).
Should look like this:
YEAR | MONTH | DAY | HOUR OF DAY | Month_num | Date | Date1 | NEW COLUMN hrxhr |
---|---|---|---|---|---|---|---|
2023 | Dec | 31 | 22 | 12 | 2023-12-31 | 365 | 8758 |
2023 | Dec | 31 | 23 | 12 | 2023-12-31 | 365 | 8759 |
2023 | Dec | 31 | 24 | 12 | 2023-12-31 | 365 | 8760 |
2024 | Jan | 01 | 1 | 12 | 2024-01-01 | 1 | 1 |
2023 | Jan | 01 | 2 | 12 | 2024-01-01 | 1 | 2 |
At first I thought I could get the Julian date and then multiple that by the HE, but that is incorrect since Jan 2, 2023, HE:1 would then equal 2 but the hrxhr/running total should equal 25.
CodePudding user response:
If you are open to a tidyverse
/ lubridate
solution, you could use
library(dplyr)
library(lubridate)
df1 %>%
mutate(
begin = ymd_hms(paste(year(Date), "-01-01 00:00:00")),
target = ymd_hms(paste(Date, HOUR_OF_DAY, ":00:00")),
hrxhr = time_length(interval(begin, target), "hours")) %>%
select(-begin, -target)
This returns
# A tibble: 5 × 7
YEAR MONTH DAY HOUR_OF_DAY Month_num Date hrxhr
<dbl> <chr> <chr> <dbl> <dbl> <date> <dbl>
1 2023 Dec 31 22 12 2023-12-31 8758
2 2023 Dec 31 23 12 2023-12-31 8759
3 2023 Dec 31 24 12 2023-12-31 8760
4 2024 Jan 01 1 12 2024-01-01 1
5 2024 Jan 01 2 12 2024-01-01 2
Data
structure(list(YEAR = c(2023, 2023, 2023, 2024, 2024), MONTH = c("Dec",
"Dec", "Dec", "Jan", "Jan"), DAY = c("31", "31", "31", "01",
"01"), HOUR_OF_DAY = c(22, 23, 24, 1, 2), Month_num = c(12, 12,
12, 12, 12), Date = structure(c(19722, 19722, 19722, 19723, 19723
), class = "Date")), row.names = c(NA, -5L), class = "data.frame")
CodePudding user response:
In base R:
df <- data.frame(
YEAR = c(2023L, 2023L, 2023L, 2024L, 2023L),
MONTH = c("Dec", "Dec", "Dec", "Jan", "Jan"), DAY = c(31L, 31L, 31L, 1L, 1L),
HOUR_OF_DAY = c(22L, 23L, 24L, 1L, 2L), Month_num = c(12L,
12L, 12L, 12L, 12L), Date = c("2023-12-31", "2023-12-31",
"2023-12-31", "2024-01-01", "2024-01-01"), Date1 = c(365L,
365L, 365L, 1L, 1L))
df$hrxhr <- mapply(\(from, to, by) length(seq.POSIXt(from, to, by)),
from = trunc(as.POSIXlt(df$Date), "years"),
to = as.POSIXlt(df$Date),
by="1 hour") df$HOUR_OF_DAY - 1
df
#> YEAR MONTH DAY HOUR_OF_DAY Month_num Date Date1 hrxhr
#> 1 2023 Dec 31 22 12 2023-12-31 365 8758
#> 2 2023 Dec 31 23 12 2023-12-31 365 8759
#> 3 2023 Dec 31 24 12 2023-12-31 365 8760
#> 4 2024 Jan 1 1 12 2024-01-01 1 1
#> 5 2023 Jan 1 2 12 2024-01-01 1 2