Home > database >  How to create new column based off information in other columns in R
How to create new column based off information in other columns in R

Time:01-19

I have a large dataset that spans over 20 years. I have a column for the date and another column for the hour ending (HE). I'm trying to add a new column to provide the hour by hour (hrxhr) information in a given year (so running total). So date: Jan 1, 2023, HE: 1 should be hrxhr: 1 and Dec 31, 2023, HE: 24, should be hrxhr:8760 (8784 on leap years).

Should look like this:

YEAR MONTH DAY HOUR OF DAY Month_num Date Date1 NEW COLUMN hrxhr
2023 Dec 31 22 12 2023-12-31 365 8758
2023 Dec 31 23 12 2023-12-31 365 8759
2023 Dec 31 24 12 2023-12-31 365 8760
2024 Jan 01 1 12 2024-01-01 1 1
2023 Jan 01 2 12 2024-01-01 1 2

At first I thought I could get the Julian date and then multiple that by the HE, but that is incorrect since Jan 2, 2023, HE:1 would then equal 2 but the hrxhr/running total should equal 25.

CodePudding user response:

If you are open to a tidyverse / lubridate solution, you could use

library(dplyr)
library(lubridate)

df1 %>% 
  mutate(
    begin  = ymd_hms(paste(year(Date), "-01-01 00:00:00")),
    target = ymd_hms(paste(Date,  HOUR_OF_DAY, ":00:00")),
    hrxhr = time_length(interval(begin, target), "hours")) %>% 
  select(-begin, -target)

This returns

# A tibble: 5 × 7
   YEAR MONTH DAY   HOUR_OF_DAY Month_num Date       hrxhr
  <dbl> <chr> <chr>       <dbl>     <dbl> <date>     <dbl>
1  2023 Dec   31             22        12 2023-12-31  8758
2  2023 Dec   31             23        12 2023-12-31  8759
3  2023 Dec   31             24        12 2023-12-31  8760
4  2024 Jan   01              1        12 2024-01-01     1
5  2024 Jan   01              2        12 2024-01-01     2

Data

structure(list(YEAR = c(2023, 2023, 2023, 2024, 2024), MONTH = c("Dec", 
"Dec", "Dec", "Jan", "Jan"), DAY = c("31", "31", "31", "01", 
"01"), HOUR_OF_DAY = c(22, 23, 24, 1, 2), Month_num = c(12, 12, 
12, 12, 12), Date = structure(c(19722, 19722, 19722, 19723, 19723
), class = "Date")), row.names = c(NA, -5L), class = "data.frame")

CodePudding user response:

In base R:

df <- data.frame(
    YEAR = c(2023L, 2023L, 2023L, 2024L, 2023L), 
    MONTH = c("Dec", "Dec", "Dec", "Jan", "Jan"), DAY = c(31L, 31L, 31L, 1L, 1L), 
    HOUR_OF_DAY = c(22L, 23L, 24L, 1L, 2L), Month_num = c(12L, 
    12L, 12L, 12L, 12L), Date = c("2023-12-31", "2023-12-31", 
    "2023-12-31", "2024-01-01", "2024-01-01"), Date1 = c(365L, 
    365L, 365L, 1L, 1L))


df$hrxhr <- mapply(\(from, to, by) length(seq.POSIXt(from, to, by)),
       from = trunc(as.POSIXlt(df$Date), "years"), 
        to = as.POSIXlt(df$Date), 
        by="1 hour")   df$HOUR_OF_DAY - 1

df
#>   YEAR MONTH DAY HOUR_OF_DAY Month_num       Date Date1 hrxhr
#> 1 2023   Dec  31          22        12 2023-12-31   365  8758
#> 2 2023   Dec  31          23        12 2023-12-31   365  8759
#> 3 2023   Dec  31          24        12 2023-12-31   365  8760
#> 4 2024   Jan   1           1        12 2024-01-01     1     1
#> 5 2023   Jan   1           2        12 2024-01-01     1     2
  •  Tags:  
  • r
  • Related