I have a dataframe named Data2 and I wish to put values of it inside a postgresql table. For reasons, I cannot use to_sql as some of the values in Data2 are numpy arrays.
This is Data2's schema:
cursor.execute(
"""
DROP TABLE IF EXISTS Data2;
CREATE TABLE Data2 (
time timestamp without time zone,
u bytea,
v bytea,
w bytea,
spd bytea,
dir bytea,
temp bytea
);
"""
)
My code segment:
for col in Data2_mcw.columns:
for row in Data2_mcw.index:
value = Data2_mcw[col].loc[row]
if type(value).__module__ == np.__name__:
value = pickle.dumps(value)
cursor.execute(
"""
INSERT INTO Data2_mcw(%s)
VALUES (%s)
"""
,
(col.replace('\"',''),value)
)
Error generated:
psycopg2.errors.SyntaxError: syntax error at or near "'time'"
LINE 2: INSERT INTO Data2_mcw('time')
How do I rectify this error?
Any help would be much appreciated!
CodePudding user response:
There are two problems I see with this code.
The first problem is that you cannot use bind parameters for column names, only for values. The first of the two %s
placeholders in your SQL string is invalid. You will have to use string concatenation to set column names, something like the following (assuming you are using Python 3.6 ):
cursor.execute(
f"""
INSERT INTO Data2_mcw({col})
VALUES (%s)
""",
(value,))
The second problem is that a SQL INSERT
statement inserts an entire row. It does not insert a single value into an already-existing row, as you seem to be expecting it to.
Suppose your dataframe Data2_mcw
looks like this:
a b c
0 1 2 7
1 3 4 9
Clearly, this dataframe has six values in it. If you were to run your code on this dataframe, then it would insert six rows into your database table, one for each value, and the data in your table would look like the following:
a b c
1
3
2
4
7
9
I'm guessing you don't want this: you'd rather your database table contained the following two rows instead:
a b c
1 2 7
3 4 9
Instead of inserting one value at a time, you will have to insert one entire row at time. This means you have to swap your two loops around, build the SQL string up once beforehand, and collect together all the values for a row before passing it to the database. Something like the following should hopefully work (please note that I don't have a Postgres database to test this against):
column_names = ",".join(Data2_mcw.columns)
placeholders = ",".join(["%s"] * len(Data2_mcw.columns))
sql = f"INSERT INTO Data2_mcw({column_names}) VALUES ({placeholders})"
for row in Data2_mcw.index:
values = []
for col in Data2_mcw.columns:
value = Data2_mcw[col].loc[row]
if type(value).__module__ == np.__name__:
value = pickle.dumps(value)
values.append(value)
cursor.execute(sql, values)