Home > Mobile >  data.table does not allow lubridate's fast_strptime
data.table does not allow lubridate's fast_strptime

Time:06-26

Say I have this df

df <- data.table(a = c('2022-01-20', '2022-01-21')
                 ); df
            a
1: 2022-01-20
2: 2022-01-21

Note that lubridate is able to convert this character column to date properly

fast_strptime(df$a, "%Y-%m-%d")

[1] "2022-01-20 UTC" "2022-01-21 UTC"

but when trying to store back to df data.table gives

df[, a := fast_strptime(a, "%Y-%m-%d") ]

Error in `[.data.table`(df, , `:=`(a, fast_strptime(a, "%Y-%m-%d"))) : 
      Supplied 9 items to be assigned to 2 items of column 'a'. If you wish to 'recycle' the RHS please use rep() to make this intent clear to readers of your code.

Looking forward to any ideas. Thank you.

CodePudding user response:

strptime returns a list with POSIXlt class

str(fast_strptime(df$a, "%Y-%m-%d"))
 POSIXlt[1:2], format: "2022-01-20" "2022-01-21"
> unclass(fast_strptime(df$a, "%Y-%m-%d"))
$sec
[1] 0 0

$min
[1] 0 0

$hour
[1] 0 0

$mday
[1] 20 21

$mon
[1] 0 0

$year
[1] 122 122

$wday
[1] NA NA

$yday
[1] NA NA

$isdst
[1] -1

attr(,"tzone")
[1] "UTC"

we may need to convert to POSIXct

df[, a := as.POSIXct(fast_strptime(a, "%Y-%m-%d")) ]

-output

> df
            a
       <POSc>
1: 2022-01-20
2: 2022-01-21

Instead of converting to POSIXlt and then to POSIXct, we could directly convert to POSIXct with a faster option ?parse_date_time2

parse_date_time2() is a fast C parser of numeric orders.

fast_strptime() is a fast C parser of numeric formats only that accepts explicit format arguments, just like base::strptime().

df[, a := parse_date_time2(a, "%Y-%m-%d") ]

-output

> df
            a
       <POSc>
1: 2022-01-20
2: 2022-01-21
  • Related