ANSWER
Abhi's answer here: https://stackoverflow.com/a/74057074/4907339 provided me a great clue.
peewee.DateTimeField()
also accepts a datetime as a string.
This is working:
hdf.time = hdf.time.astype(str)
hdf_dict = hdf.to_dict(orient="records")
db.Candles1Minute.insert_many(hdf_dict).execute()
qlite> select * from candles1minute limit 1;
id symbol time open high low close volume
-- ------ ------------------- ------ ------ ------- ------- ------
1 USDCAD 2022-10-06 17:15:00 1.3744 1.3745 1.37435 1.37435
sqlite>
Original Question
I am trying use pandas.to_dict()
on a dataframe so I can insert the dict into sqlite
via the peewee.insert_many()
bulk insert operation. In order to do so, I need to covert Timestamp()
to datetime.datetime()
so it's compatible with peewee.DateTimeField()
Many of the answers I've seen here refer to converting to datetime.date()
which isn't what I want.
I also don't want to use to_json()
. That will convert Timestamp()
to int()
, and while that will be compatible with peewee
I don't want to store the dates as int
.
I have found some answers that describe various uses of to_pydatetime()
but I can't seem to get that right, as the results are still Timestamp()
:
# print(hdf.info())
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1 entries, 0 to 0
Data columns (total 6 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 symbol 1 non-null object
1 time 1 non-null datetime64[ns]
2 open 1 non-null float64
3 high 1 non-null float64
4 low 1 non-null float64
5 close 1 non-null float64
dtypes: datetime64[ns](1), float64(4), object(1)
memory usage: 176.0 bytes
None
# print(hdf.tail(5))
symbol time open high low close
0 USDCAD 2022-10-13 09:20:00 1.39680 1.39685 1.39375 1.39475
1 USDCAD 2022-10-13 09:21:00 1.39475 1.39605 1.39470 1.39580
...
# hdf.time = hdf.time.apply(lambda x: x.to_pydatetime())
# hdf_dict = hdf.to_dict(orient="records")
# print(hdf_dict)
[{'symbol': 'USDCAD', 'time': Timestamp('2022-10-13 09:20:00'), 'open': 1.3968, 'high': 1.39685, 'low': 1.39375, 'close': 1.39475}, {'symbol': 'USDCAD', 'time': Timestamp('2022-10-13 09:21:00'), 'open': 1.39475, 'high': 1.39605, 'low': 1.3947, 'close': 1.3958}]
# db.Candles1Minute.insert_many(hdf_dict).execute()
InterfaceError Traceback (most recent call last)
File ~/Library/Caches/pypoetry/virtualenvs/ariobot-bfns45lq-py3.10/lib/python3.10/site-packages/peewee.py:3197, in Database.execute_sql(self, sql, params, commit)
3196 try:
-> 3197 cursor.execute(sql, params or ())
3198 except Exception:
InterfaceError: Error binding parameter 1 - probably unsupported type.
Where parameter 1
corresponds to the DateTimeField()
in the peewee model
declaration:
class Candles1Minute(BaseModel):
symbol = TextField()
time = DateTimeField()
open = FloatField()
high = FloatField()
low = FloatField()
close = FloatField()
volume = IntegerField(null=True)
class Meta:
indexes = ((("symbol", "time"), True),)
There are tens of thousands of rows in the dataframe, so I'd like this conversion to be fast and efficient, so I'm thinking it'd be much moreso doing this at the Pandas level as opposed to having to iterate through the list of dicts and do the conversion there.
CodePudding user response:
You could first convert the date
to str
and then using to_json
you can retain the date value as it is without populating the integer value.
Below is your dataframe lets say df
symbol time open high low close
0 USDCAD 2022-10-13 09:20:00 1.39680 1.39685 1.3937 1.39475
1 USDCAD 2022-10-13 09:21:00 1.39475 1.39605 1.39470 1.39580
Convert the dtype to str
with the below code
df['time'] = df['time'].astype(str)
df.to_json()
Below is the output
'{"symbol":{"0":"USDCAD","1":"USDCAD"},"time":{"0":"2022-10-13 09:20:00","1":"2022-10-13 09:21:00"},"open":{"0":"1.39680","1":"1.39475"},"high":{"0":"1.39685","1":"1.39605"},"low":{"0":"1.3937","1":"1.39470"},"close":{"0":"1.39475","1":"1.39580"}}'
If needed you can iterate and convert back the str
value to datetime
CodePudding user response:
a little convoluted but seems to get the job done?
def my_to_pydatetime(ts):
dti = pd.date_range(start=date,periods=1,freq='D')
return dti.to_pydatetime()[0]
my_to_pydatetime(pd.Timestamp(str(20221013)))
datetime.datetime(2022, 10, 13, 0, 0)
%%timeit
my_to_pydatetime(pd.Timestamp(str(20221013)))
110 µs ± 186 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)