Home > Mobile >  Converting pandas Timestamp() to datetime.datetime() to support peewee DateTimeField()
Converting pandas Timestamp() to datetime.datetime() to support peewee DateTimeField()

Time:10-14

ANSWER

Abhi's answer here: https://stackoverflow.com/a/74057074/4907339 provided me a great clue.

peewee.DateTimeField() also accepts a datetime as a string.

This is working:

hdf.time = hdf.time.astype(str)
hdf_dict = hdf.to_dict(orient="records")
db.Candles1Minute.insert_many(hdf_dict).execute()
qlite> select * from candles1minute limit 1;
id  symbol  time                 open    high    low      close    volume
--  ------  -------------------  ------  ------  -------  -------  ------
1   USDCAD  2022-10-06 17:15:00  1.3744  1.3745  1.37435  1.37435        
sqlite> 

Original Question

I am trying use pandas.to_dict() on a dataframe so I can insert the dict into sqlite via the peewee.insert_many() bulk insert operation. In order to do so, I need to covert Timestamp() to datetime.datetime() so it's compatible with peewee.DateTimeField()

Many of the answers I've seen here refer to converting to datetime.date() which isn't what I want.

I also don't want to use to_json(). That will convert Timestamp() to int(), and while that will be compatible with peewee I don't want to store the dates as int.

I have found some answers that describe various uses of to_pydatetime() but I can't seem to get that right, as the results are still Timestamp():

# print(hdf.info())
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1 entries, 0 to 0
Data columns (total 6 columns):
 #   Column  Non-Null Count  Dtype         
---  ------  --------------  -----         
 0   symbol  1 non-null      object        
 1   time    1 non-null      datetime64[ns]
 2   open    1 non-null      float64       
 3   high    1 non-null      float64       
 4   low     1 non-null      float64       
 5   close   1 non-null      float64       
dtypes: datetime64[ns](1), float64(4), object(1)
memory usage: 176.0  bytes
None

# print(hdf.tail(5))
   symbol                time     open     high      low    close
0  USDCAD 2022-10-13 09:20:00  1.39680  1.39685  1.39375  1.39475
1  USDCAD 2022-10-13 09:21:00  1.39475  1.39605  1.39470  1.39580
...

# hdf.time = hdf.time.apply(lambda x: x.to_pydatetime())
# hdf_dict = hdf.to_dict(orient="records")
# print(hdf_dict)
[{'symbol': 'USDCAD', 'time': Timestamp('2022-10-13 09:20:00'), 'open': 1.3968, 'high': 1.39685, 'low': 1.39375, 'close': 1.39475}, {'symbol': 'USDCAD', 'time': Timestamp('2022-10-13 09:21:00'), 'open': 1.39475, 'high': 1.39605, 'low': 1.3947, 'close': 1.3958}]
# db.Candles1Minute.insert_many(hdf_dict).execute()
InterfaceError                            Traceback (most recent call last)
File ~/Library/Caches/pypoetry/virtualenvs/ariobot-bfns45lq-py3.10/lib/python3.10/site-packages/peewee.py:3197, in Database.execute_sql(self, sql, params, commit)
   3196 try:
-> 3197     cursor.execute(sql, params or ())
   3198 except Exception:

InterfaceError: Error binding parameter 1 - probably unsupported type.

Where parameter 1 corresponds to the DateTimeField() in the peewee model declaration:

class Candles1Minute(BaseModel):
    symbol = TextField()
    time = DateTimeField()
    open = FloatField()
    high = FloatField()
    low = FloatField()
    close = FloatField()
    volume = IntegerField(null=True)

    class Meta:
        indexes = ((("symbol", "time"), True),)

There are tens of thousands of rows in the dataframe, so I'd like this conversion to be fast and efficient, so I'm thinking it'd be much moreso doing this at the Pandas level as opposed to having to iterate through the list of dicts and do the conversion there.

CodePudding user response:

You could first convert the date to str and then using to_json you can retain the date value as it is without populating the integer value.

Below is your dataframe lets say df

    symbol  time    open    high    low close
0   USDCAD  2022-10-13 09:20:00 1.39680 1.39685 1.3937  1.39475
1   USDCAD  2022-10-13 09:21:00 1.39475 1.39605 1.39470 1.39580

Convert the dtype to str with the below code

df['time'] = df['time'].astype(str)
df.to_json()

Below is the output

'{"symbol":{"0":"USDCAD","1":"USDCAD"},"time":{"0":"2022-10-13 09:20:00","1":"2022-10-13 09:21:00"},"open":{"0":"1.39680","1":"1.39475"},"high":{"0":"1.39685","1":"1.39605"},"low":{"0":"1.3937","1":"1.39470"},"close":{"0":"1.39475","1":"1.39580"}}'

If needed you can iterate and convert back the str value to datetime

CodePudding user response:

a little convoluted but seems to get the job done?

def my_to_pydatetime(ts):
    dti = pd.date_range(start=date,periods=1,freq='D')
    return dti.to_pydatetime()[0]

my_to_pydatetime(pd.Timestamp(str(20221013)))
datetime.datetime(2022, 10, 13, 0, 0)

%%timeit
my_to_pydatetime(pd.Timestamp(str(20221013)))
110 µs ± 186 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
  • Related