Home > Software engineering >  Sorting a dataframe into date and time
Sorting a dataframe into date and time

Time:07-05

I have imported data from excel that looks like this:

Time_int <- c('01FEB2022_00:00:00.000000', '00:00 - 06:00', '06:00 - 12:00', '12:00 - 18:00', '18:00 - 00:00', 
              '02FEB2022_00:00:00.000000', '00:00 - 06:00', '06:00 - 12:00', '12:00 - 18:00', '18:00 - 00:00',
              '03FEB2022_00:00:00.000000', '00:00 - 06:00', '06:00 - 12:00', '12:00 - 18:00', '18:00 - 00:00')
Values <- c(NA, 2682, 2689, 2690, 2686, NA, 2717, 2754, 2745, 2734, NA, 2735, 2746, 2753, 2701)
df <- cbind(Time_int, Values)

                 Time_int Values
01FEB2022_00:00:00.000000   <NA>
            00:00 - 06:00   2682
            06:00 - 12:00   2689
            12:00 - 18:00   2690
            18:00 - 00:00   2686
02FEB2022_00:00:00.000000   <NA>
            00:00 - 06:00   2717
            06:00 - 12:00   2754
            12:00 - 18:00   2745
            18:00 - 00:00   2734
03FEB2022_00:00:00.000000   <NA>
            00:00 - 06:00   2735
            06:00 - 12:00   2746
            12:00 - 18:00   2753
            18:00 - 00:00   2701

The issue with this is that the date may interrupt my time-series calculations. So I want to be able to have the date in its own column and the time intervals in a separate column so that the values are not disrupted by the date. I would like to produce it so that it looks more like this:

Date <- c('2022/02/01', '2022/02/01', '2022/02/01', '2022/02/01',
          '2022/02/02', '2022/02/02', '2022/02/02', '2022/02/02', 
          '2022/02/03', '2022/02/03', '2022/02/03', '2022/02/03')
Int <- c('00:00 - 06:00', '06:00 - 12:00', '12:00 - 18:00', '18:00 - 00:00', 
         '00:00 - 06:00', '06:00 - 12:00', '12:00 - 18:00', '18:00 - 00:00',
         '00:00 - 06:00', '06:00 - 12:00', '12:00 - 18:00', '18:00 - 00:00')
V <- c(2682, 2689, 2690, 2686, 2717, 2754, 2745, 2734, 2735, 2746, 2753, 2701)
df2 <- cbind(Date, Int, V)

      Date           Int    V
2022-02-01 00:00 - 06:00 2682
2022-02-01 06:00 - 12:00 2689
2022-02-01 12:00 - 18:00 2690
2022-02-01 18:00 - 00:00 2686
2022-02-02 00:00 - 06:00 2717
2022-02-02 06:00 - 12:00 2754
2022-02-02 12:00 - 18:00 2745
2022-02-02 18:00 - 00:00 2734
2022-02-03 00:00 - 06:00 2735
2022-02-03 06:00 - 12:00 2746
2022-02-03 12:00 - 18:00 2753
2022-02-03 18:00 - 00:00 2701

I am also open to any other possible solutions since I am not entirely sure if this is the correct table format to use when testing intraday data.

CodePudding user response:

Here is a tidyverse approach:

library(dplyr); library(tidyr)

df |>
    data.frame() |>
    mutate(
        Date = as.Date(
            gsub("_. ", "", Time_int), format = "%d%B%Y"
        )
    ) |>
    fill(Date) |>
    drop_na(Values) |>
    rename(
        V = Values,
        Int = Time_int
    ) |>
    relocate(
        Int, Date
    )

#              Int       Date    V
# 1  00:00 - 06:00 2022-02-01 2682
# 2  06:00 - 12:00 2022-02-01 2689
# 3  12:00 - 18:00 2022-02-01 2690
# 4  18:00 - 00:00 2022-02-01 2686
# 5  00:00 - 06:00 2022-02-02 2717
# 6  06:00 - 12:00 2022-02-02 2754
# 7  12:00 - 18:00 2022-02-02 2745
# 8  18:00 - 00:00 2022-02-02 2734
# 9  00:00 - 06:00 2022-02-03 2735
# 10 06:00 - 12:00 2022-02-03 2746
# 11 12:00 - 18:00 2022-02-03 2753
# 12 18:00 - 00:00 2022-02-03 2701

CodePudding user response:

Here is how we could do it:

  1. transform matrix class to tibble
  2. create Date column (duplicate of Time_int)
  3. create group with specific feature 00:00:00.000000 and cumsum
  4. extract date character and transform to date class with lubridates dmy function
  5. fill and drop.na
library(tidyverse)
library(lubridate)

df %>% 
  as_tibble() %>% 
  mutate(Date = Time_int) %>% 
  group_by(group_id = cumsum(str_detect(Date, '00:00:00.000000'))) %>% 
  mutate(Date = dmy(str_extract(Date, '\\d{2}\\w \\d{4}'))) %>% 
  fill(Date, .direction = "down") %>% 
  drop_na() %>% 
  ungroup() %>% 
  select(Date, Int = Time_int, V = Values) 
  
  Date       Int           V    
   <date>     <chr>         <chr>
 1 2022-02-01 00:00 - 06:00 2682 
 2 2022-02-01 06:00 - 12:00 2689 
 3 2022-02-01 12:00 - 18:00 2690 
 4 2022-02-01 18:00 - 00:00 2686 
 5 2022-02-02 00:00 - 06:00 2717 
 6 2022-02-02 06:00 - 12:00 2754 
 7 2022-02-02 12:00 - 18:00 2745 
 8 2022-02-02 18:00 - 00:00 2734 
 9 2022-02-03 00:00 - 06:00 2735 
10 2022-02-03 06:00 - 12:00 2746 
11 2022-02-03 12:00 - 18:00 2753 
12 2022-02-03 18:00 - 00:00 2701 
  • Related