Home > Net >  Split data.table a row based on a string pattern
Split data.table a row based on a string pattern

Time:08-05

I have the following data example:

structure(list(name = c("2020-12-02 02_05_24.143926", "2020-12-02 04_05_44.370258", 
"2020-12-02 08_06_25.214121", "2020-12-02 10_06_45.697784", "2020-12-02 14_07_25.747003", 
"2020-12-02 16_07_46.002571", "2020-12-02 20_08_25.838364", "2020-12-02 22_08_45.705227", 
"2020-12-03 02_09_25.384941", "2020-12-03 04_09_44.709639", "2020-12-03 08_10_23.097440", 
"2020-12-03 10_10_42.111583", "2020-12-03 14_11_20.193122", "2020-12-03 16_11_39.252692", 
"2020-12-03 20_12_17.340138", "2020-12-03 22_12_36.086608", "2020-12-04 02_15_27.387402", 
"2020-12-04 04_15_46.375845", "2020-12-04 08_16_24.414194", "2020-12-04 10_16_43.215919", 
"2020-12-31 10_06_26.083394", "2020-12-31 10_36_30.720992", "2020-12-31 14_07_03.081910", 
"2020-12-31 14_37_07.718933", "2020-12-31 16_07_21.515981", "2020-12-31 16_37_26.054783", 
"2020-12-31 20_07_58.646942", "2020-12-31 20_38_03.155509", "2020-12-31 22_08_17.181192", 
"2020-12-31 22_38_21.847135", "2021-01-01 02_08_54.245043", "2021-01-01 02_38_58.905204", 
"2021-01-01 04_09_13.055522", "2021-01-01 04_39_17.797032", "2021-01-01 08_09_50.080337", 
"2021-01-01 08_39_54.646102", "2021-01-01 10_10_08.580802", "2021-01-01 10_40_13.262391", 
"2021-01-01 14_10_45.513987", "2021-01-01 14_40_50.152527", "2021-01-01 16_11_03.966316", 
"2021-01-01 16_41_08.595758", "2021-01-01 20_11_41.136895", "2021-01-01 20_41_45.807547", 
"2021-01-01 22_11_59.897654", "2021-01-01 22_42_04.619130", "2021-01-02 02_12_37.503054", 
"2021-01-02 02_42_42.155622", "2021-01-02 04_12_56.127958", "2021-01-02 04_43_00.807846", 
"2021-01-02 08_13_33.280704")), row.names = c(NA, -51L), class = c("data.table", 
"data.frame")>)

This data consists of a date and time (It's not necessary to define it as date and time). However I would like to split it by specific dates/values that matched, for example: 1 datatable with data/values before 2020-12-31, between 2020-12-31 and 01-01-2021 and after 01-01-2021.

Thanks all

CodePudding user response:

On possible way to solve your problem:

library(data.table)

breaks = as.Date(c("2020-12-31", "2021-01-01"))
split(df, findInterval(as.Date(substr(df$name, 1, 10)), breaks))

$`0`
                          name
                        <char>
 1: 2020-12-02 02_05_24.143926
 2: 2020-12-02 04_05_44.370258
 3: 2020-12-02 08_06_25.214121
 4: 2020-12-02 10_06_45.697784
 5: 2020-12-02 14_07_25.747003
 6: 2020-12-02 16_07_46.002571
 7: 2020-12-02 20_08_25.838364
 8: 2020-12-02 22_08_45.705227
 9: 2020-12-03 02_09_25.384941
10: 2020-12-03 04_09_44.709639
11: 2020-12-03 08_10_23.097440
12: 2020-12-03 10_10_42.111583
13: 2020-12-03 14_11_20.193122
14: 2020-12-03 16_11_39.252692
15: 2020-12-03 20_12_17.340138
16: 2020-12-03 22_12_36.086608
17: 2020-12-04 02_15_27.387402
18: 2020-12-04 04_15_46.375845
19: 2020-12-04 08_16_24.414194
20: 2020-12-04 10_16_43.215919
                          name

$`1`
                          name
                        <char>
 1: 2020-12-31 10_06_26.083394
 2: 2020-12-31 10_36_30.720992
 3: 2020-12-31 14_07_03.081910
 4: 2020-12-31 14_37_07.718933
 5: 2020-12-31 16_07_21.515981
 6: 2020-12-31 16_37_26.054783
 7: 2020-12-31 20_07_58.646942
 8: 2020-12-31 20_38_03.155509
 9: 2020-12-31 22_08_17.181192
10: 2020-12-31 22_38_21.847135

$`2`
                          name
                        <char>
 1: 2021-01-01 02_08_54.245043
 2: 2021-01-01 02_38_58.905204
 3: 2021-01-01 04_09_13.055522
 4: 2021-01-01 04_39_17.797032
 5: 2021-01-01 08_09_50.080337
 6: 2021-01-01 08_39_54.646102
 7: 2021-01-01 10_10_08.580802
 8: 2021-01-01 10_40_13.262391
 9: 2021-01-01 14_10_45.513987
10: 2021-01-01 14_40_50.152527
11: 2021-01-01 16_11_03.966316
12: 2021-01-01 16_41_08.595758
13: 2021-01-01 20_11_41.136895
14: 2021-01-01 20_41_45.807547
15: 2021-01-01 22_11_59.897654
16: 2021-01-01 22_42_04.619130
17: 2021-01-02 02_12_37.503054
18: 2021-01-02 02_42_42.155622
19: 2021-01-02 04_12_56.127958
20: 2021-01-02 04_43_00.807846
21: 2021-01-02 08_13_33.280704
                          name

CodePudding user response:

split(
  DT,
  DT[, fcase(name < "2020-12-31", 1, name <= "2021-01-01", 2, default = 3)]  
)

CodePudding user response:

lubridate is a helpful package for working with dates and times. Saving the given structure to variable 'dt' these subsets can be generated as follows:

library(lubridate)
library(data.table)
setDT(dt)
dt[,datetime:=ymd_hms(name)]
dt1 <- dt[datetime < ymd("2020-12-31")]
dt2 <- dt[datetime            
  • Related