I have a dataframe that holds datetime values and arrays and possibly other datatypes in the future. I wish to to_sql it to PostgreSQL where datetime is a (timestamp without time zone) and arrays are (byte) types, but I have no idea what to put for the dtype argument.
Is there a way to dynamically do the dtype based on the dataframe column's datatype?
How the table looks:
CREATE TABLE IF NOT EXISTS data2_mcw_tmp (
time timestamp without time zone,
u bytea,
v bytea,
w bytea,
spd bytea,
dir bytea,
temp bytea
);
My code so far (after from help from user rftr):
dtypedict = {}
Data2_mcw_conv = Data2_mcw.copy()
for row in Data2_mcw_conv.index:
for col in Data2_mcw_conv.columns:
value = Data2_mcw_conv[col][row]
try:
if type(Data2_mcw_conv[col].iloc[0]).__module__ == np.__name__:
dtypedict.update({col:BYTEA})
value = Data2_mcw_conv[col].loc[row]
print('before: ')
print (value.flags)
print('---------------------')
value = value.copy(order='C')
print('after: ')
print (value.flags)
print('=====================')
value = pickle.dumps(value)
except:
if isinstance(Data2_mcw_conv[col].iloc[0], datetime.date):
dtypedict.update({col:TIMESTAMP})
Data2_mcw_conv[col][row] = value
Data2_mcw_conv.to_sql(name='data2_mcw_tmp',con=conn,
if_exists = 'replace',
dtype=dtypedict)
However, I get this error:
Traceback (most recent call last):
File "C:\Users\myname\Desktop\database\pickletopdb2.py", line 145, in <module>
postgres_conv()
File "C:\Users\myname\Desktop\database\pickletopdb2.py", line 124, in postgres_conv
Data2_mcw_conv.to_sql(name='data2_mcw_tmp',con=conn,
File "C:\Python38\lib\site-packages\pandas\core\generic.py", line 2778, in to_sql
sql.to_sql(
File "C:\Python38\lib\site-packages\pandas\io\sql.py", line 590, in to_sql
pandas_sql.to_sql(
File "C:\Python38\lib\site-packages\pandas\io\sql.py", line 1397, in to_sql
table.insert(chunksize, method=method)
File "C:\Python38\lib\site-packages\pandas\io\sql.py", line 831, in insert
exec_insert(conn, keys, chunk_iter)
File "C:\Python38\lib\site-packages\pandas\io\sql.py", line 748, in _execute_insert
conn.execute(self.table.insert(), data)
File "C:\Python38\lib\site-packages\sqlalchemy\engine\base.py", line 1286, in execute
return meth(self, multiparams, params, _EMPTY_EXECUTION_OPTS)
File "C:\Python38\lib\site-packages\sqlalchemy\sql\elements.py", line 325, in _execute_on_connection
return connection._execute_clauseelement(
File "C:\Python38\lib\site-packages\sqlalchemy\engine\base.py", line 1478, in _execute_clauseelement
ret = self._execute_context(
File "C:\Python38\lib\site-packages\sqlalchemy\engine\base.py", line 1842, in _execute_context
self._handle_dbapi_exception(
File "C:\Python38\lib\site-packages\sqlalchemy\engine\base.py", line 2027, in _handle_dbapi_exception
util.raise_(exc_info[1], with_traceback=exc_info[2])
File "C:\Python38\lib\site-packages\sqlalchemy\util\compat.py", line 207, in raise_
raise exception
File "C:\Python38\lib\site-packages\sqlalchemy\engine\base.py", line 1779, in _execute_context
self.dialect.do_executemany(
File "C:\Python38\lib\site-packages\sqlalchemy\dialects\postgresql\psycopg2.py", line 951, in do_executemany
context._psycopg2_fetched_rows = xtras.execute_values(
File "C:\Python38\lib\site-packages\psycopg2\extras.py", line 1267, in execute_values
parts.append(cur.mogrify(template, args))
ValueError: ndarray is not C-contiguous
value.flag outputs before/after value = value.copy(order='C'):
before:
C_CONTIGUOUS : False
F_CONTIGUOUS : True
OWNDATA : False
WRITEABLE : True
ALIGNED : True
WRITEBACKIFCOPY : False
UPDATEIFCOPY : False
---------------------
after:
C_CONTIGUOUS : True
F_CONTIGUOUS : False
OWNDATA : True
WRITEABLE : True
ALIGNED : True
WRITEBACKIFCOPY : False
UPDATEIFCOPY : False
=====================
Why is this error occurring and any idea how to solve it?
CodePudding user response:
Follow the second code snippet of this answer and customize the dtypes to their PostgreSQL
equivalents from here. So in your case, e.g.:
from sqlalchemy.dialects.postgresql import BYTEA, TIMESTAMP
def sqlcol(dfparam):
# ...
if "datetime" in str(j):
dtypedict.update({i: TIMESTAMP})
if "object" in str(j): # Depending on what your other column's datatypes are
dtypesdict.update({i: BYTEA})
# ...
Notes:
- According to the docs,
TIMESTAMP
is without timezone by default. - Your (byte) columns are usually represented with datatype
object
inpandas
. You should take this into consideration, if you add further data in the future.