Here is my issue: I am using reticulate to source python data frames from a database. One of the variables is in a date format. When I do the conversion from python to R, the date variable gets transformed into a list object and all the entries show as <dattm.dt> (see dput below): I have been dealing with the problem as follows:
library(tidyverse);library(reticulate); library(lubridate)
date_strings <- x %>% pull(date_object) ##Retrieve the date listt
fixed_dates <- sapply(1:length(date_strings), function(j){
p <- py_to_r(date_strings[[j]])
return(p)} %>% as_date() ##Apply function to fix each entry individually
##Dput below
structure(list(date_object = list(<environment>, <environment>,
<environment>, <environment>, <environment>, <environment>,
<environment>, <environment>, <environment>, <environment>,
<environment>, <environment>, <environment>, <environment>,
<environment>, <environment>, <environment>, <environment>,
<environment>, <environment>), metric = c(0.216754862863576,
-0.542492572263425, 0.891144645072327, 0.595980577187475, 1.63561800111297,
0.689275441919723, -1.28124663010116, -0.213144519278363, 1.89653987190927,
1.77686321368272, 0.566604498180317, 0.01571945400457, 0.383057338517151,
-0.0451371159133086, 0.0343519073969926, 0.169026774218306, 1.16502683902767,
-0.0442039972520874, -0.100368442585905, -0.283444568873591)), row.names = c(NA,
-20L), class = c("tbl_df", "tbl", "data.frame"), pandas.index = <environment>)
Here are the top elements of the date_strings object:
[[1]]
<environment: 0x7f904dc4d5b8>
attr(,"class")
[1] "datetime.date" "python.builtin.object"
[[2]]
<environment: 0x7f904dc4d430>
attr(,"class")
[1] "datetime.date" "python.builtin.object"
[[3]]
<environment: 0x7f904dc4d318>
attr(,"class")
[1] "datetime.date" "python.builtin.object"
While this approach works well for small datasets, it takes a really long time when the data frame is big (think thousands of rows). Is there a way to optimize the process or to vectorize it?
CodePudding user response:
We may use lapply
instead of sapply
and convert to a vector
with c
using do.call
. The reason is that if the evaluated dates are Date
class,
c` will not coerce it to integer mode
do.call(c, lapply(seq_along(date_strings),
function(j) py_to_r(date_strings[[j]])))