Home > Blockchain >  pandas to_sql postgresql with array dtypes
pandas to_sql postgresql with array dtypes

Time:10-30

I have a dataframe that holds datetime values and arrays and possibly other datatypes in the future. I wish to to_sql it to PostgreSQL where datetime is a (timestamp without time zone) and arrays are (byte) types, but I have no idea what to put for the dtype argument.

Is there a way to dynamically do the dtype based on the dataframe column's datatype?

How the table looks:

CREATE TABLE IF NOT EXISTS data2_mcw_tmp (
      time timestamp without time zone,
      u bytea,
      v bytea,
      w bytea,
      spd bytea,
      dir bytea,
      temp bytea
);

My code so far (after from help from user rftr):

dtypedict = {}
Data2_mcw_conv = Data2_mcw.copy()

for row in Data2_mcw_conv.index:
  for col in Data2_mcw_conv.columns:

    value = Data2_mcw_conv[col][row]

    try:
      if type(Data2_mcw_conv[col].iloc[0]).__module__ == np.__name__:

        dtypedict.update({col:BYTEA})

        value = Data2_mcw_conv[col].loc[row]

        print('before: ')
        print (value.flags)
        print('---------------------')
          
        value = value.copy(order='C')

        print('after: ')
        print (value.flags)
        print('=====================')
          
        value = pickle.dumps(value)

    except:    
      if isinstance(Data2_mcw_conv[col].iloc[0], datetime.date):

        dtypedict.update({col:TIMESTAMP})
        
    Data2_mcw_conv[col][row] = value

Data2_mcw_conv.to_sql(name='data2_mcw_tmp',con=conn, 
                    if_exists = 'replace',
                    dtype=dtypedict)

However, I get this error:

Traceback (most recent call last):
  File "C:\Users\myname\Desktop\database\pickletopdb2.py", line 145, in <module>
    postgres_conv()
  File "C:\Users\myname\Desktop\database\pickletopdb2.py", line 124, in postgres_conv
    Data2_mcw_conv.to_sql(name='data2_mcw_tmp',con=conn,
  File "C:\Python38\lib\site-packages\pandas\core\generic.py", line 2778, in to_sql
    sql.to_sql(
  File "C:\Python38\lib\site-packages\pandas\io\sql.py", line 590, in to_sql
    pandas_sql.to_sql(
  File "C:\Python38\lib\site-packages\pandas\io\sql.py", line 1397, in to_sql
    table.insert(chunksize, method=method)
  File "C:\Python38\lib\site-packages\pandas\io\sql.py", line 831, in insert
    exec_insert(conn, keys, chunk_iter)
  File "C:\Python38\lib\site-packages\pandas\io\sql.py", line 748, in _execute_insert
    conn.execute(self.table.insert(), data)
  File "C:\Python38\lib\site-packages\sqlalchemy\engine\base.py", line 1286, in execute
    return meth(self, multiparams, params, _EMPTY_EXECUTION_OPTS)
  File "C:\Python38\lib\site-packages\sqlalchemy\sql\elements.py", line 325, in _execute_on_connection
    return connection._execute_clauseelement(
  File "C:\Python38\lib\site-packages\sqlalchemy\engine\base.py", line 1478, in _execute_clauseelement
    ret = self._execute_context(
  File "C:\Python38\lib\site-packages\sqlalchemy\engine\base.py", line 1842, in _execute_context
    self._handle_dbapi_exception(
  File "C:\Python38\lib\site-packages\sqlalchemy\engine\base.py", line 2027, in _handle_dbapi_exception
    util.raise_(exc_info[1], with_traceback=exc_info[2])
  File "C:\Python38\lib\site-packages\sqlalchemy\util\compat.py", line 207, in raise_
    raise exception
  File "C:\Python38\lib\site-packages\sqlalchemy\engine\base.py", line 1779, in _execute_context
    self.dialect.do_executemany(
  File "C:\Python38\lib\site-packages\sqlalchemy\dialects\postgresql\psycopg2.py", line 951, in do_executemany
    context._psycopg2_fetched_rows = xtras.execute_values(
  File "C:\Python38\lib\site-packages\psycopg2\extras.py", line 1267, in execute_values
    parts.append(cur.mogrify(template, args))
ValueError: ndarray is not C-contiguous

value.flag outputs before/after value = value.copy(order='C'):

before: 
  C_CONTIGUOUS : False
  F_CONTIGUOUS : True
  OWNDATA : False
  WRITEABLE : True
  ALIGNED : True
  WRITEBACKIFCOPY : False
  UPDATEIFCOPY : False

---------------------
after: 
  C_CONTIGUOUS : True
  F_CONTIGUOUS : False
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  WRITEBACKIFCOPY : False
  UPDATEIFCOPY : False

=====================

Why is this error occurring and any idea how to solve it?

CodePudding user response:

Follow the second code snippet of this answer and customize the dtypes to their PostgreSQL equivalents from here. So in your case, e.g.:

from sqlalchemy.dialects.postgresql import BYTEA, TIMESTAMP

def sqlcol(dfparam):
    # ...
    if "datetime" in str(j):
        dtypedict.update({i: TIMESTAMP})
    if "object" in str(j):  # Depending on what your other column's datatypes are
        dtypesdict.update({i: BYTEA})
    # ...

Notes:

  • According to the docs, TIMESTAMP is without timezone by default.
  • Your (byte) columns are usually represented with datatype object in pandas. You should take this into consideration, if you add further data in the future.
  • Related