Say I have this df
df <- data.table(a = c('2022-01-20', '2022-01-21')
); df
a
1: 2022-01-20
2: 2022-01-21
Note that lubridate
is able to convert this character column to date properly
fast_strptime(df$a, "%Y-%m-%d")
[1] "2022-01-20 UTC" "2022-01-21 UTC"
but when trying to store back to df data.table
gives
df[, a := fast_strptime(a, "%Y-%m-%d") ]
Error in `[.data.table`(df, , `:=`(a, fast_strptime(a, "%Y-%m-%d"))) :
Supplied 9 items to be assigned to 2 items of column 'a'. If you wish to 'recycle' the RHS please use rep() to make this intent clear to readers of your code.
Looking forward to any ideas. Thank you.
CodePudding user response:
strptime
returns a list
with POSIXlt
class
str(fast_strptime(df$a, "%Y-%m-%d"))
POSIXlt[1:2], format: "2022-01-20" "2022-01-21"
> unclass(fast_strptime(df$a, "%Y-%m-%d"))
$sec
[1] 0 0
$min
[1] 0 0
$hour
[1] 0 0
$mday
[1] 20 21
$mon
[1] 0 0
$year
[1] 122 122
$wday
[1] NA NA
$yday
[1] NA NA
$isdst
[1] -1
attr(,"tzone")
[1] "UTC"
we may need to convert to POSIXct
df[, a := as.POSIXct(fast_strptime(a, "%Y-%m-%d")) ]
-output
> df
a
<POSc>
1: 2022-01-20
2: 2022-01-21
Instead of converting to POSIXlt
and then to POSIXct
, we could directly convert to POSIXct
with a faster option ?parse_date_time2
parse_date_time2() is a fast C parser of numeric orders.
fast_strptime() is a fast C parser of numeric formats only that accepts explicit format arguments, just like base::strptime().
df[, a := parse_date_time2(a, "%Y-%m-%d") ]
-output
> df
a
<POSc>
1: 2022-01-20
2: 2022-01-21