I have 2 postgres databases with same schema but in 2 different schemas. I am writing a python script with a goal to export data partially from one of the tables and import the result to the same table but in a different database (like select from A where f=123
). The schema is large (it has many columns of different types, some are allowed to be null, some aren't. There are date types and string fields which can contain sentences, pseudo-queries and file names) and there can be thousands of rows in a table.
I took approach of exporting the data from table to a csv file, then importing the data from a csv file to a second database table.
I use psycopg2
lib for working with Postgres in Python along with a csv
lib to read and write csv files.
I implemented the first version. The problem was that: Some columns in a row are empty, when I read the table data in python the empty fields have None
value when the field is allowed to be null
and where the field is not allowed to be null
the value is ""
empty string and when exported to csv all the values which are None
and ""
are inserted as empty strings in a csv file. As an example the row would look like this 1234,,,,,1,,
. And when I try to import the file to a postgres table all empty values in a csv are converted to null
and are tried to insert this way, but it failed because fields which can't be null
don't accept this value. Below you can see my code and after that code I pasted the improvement i did to avoid this problem.
import psycopg2
import csv
def export_table(filename, tablename):
conn = psycopg2.connect(....)
cur = conn.cursor()
cur.execute(f'SELECT * FROM {tablename} where f=123')
rows = cur.fetchall()
with open(filename, 'w', newline='') as csvfile:
writer = csv.writer(csvfile)
for row in rows:
writer.writerow(row)
cur.close()
conn.close()
def import_table(filename, tablename):
conn = psycopg2.connect(..second db data)
cur = conn.cursor()
with open(filename, 'r') as csvfile:
cur.copy_expert(
f"COPY {tablename} FROM STDIN WITH (FORMAT CSV)",
csvfile
)
conn.commit()
cur.close()
conn.close()
I tried to add csv.QUOTE_MINIMAL
, csv.QUOTE_NONNUMERIC
- they did not help me.
Because I was not able to import the data with this code, I tried to try one more thing.
I added a manual function for quoting:
def quote_field(field):
if isinstance(field, str):
if field == '':
return '""'
elif any(c in field for c in (',', '"', '\n')):
return '"' field.replace('"', '""') '"'
return field
And updated the import part this way:
with open(filename, 'w', newline='') as csvfile:
writer = csv.writer(csvfile, quoting=csv.QUOTE_NONE, quotechar='', escapechar='\\')
for row in rows:
writer.writerow([quote_field(field) for field in row])
I tried running the code, it pasting null values to a csv as ""
and None
values are placed in a csv as just empty fields. So a row in a csv would look like this 1234,,,"","",,,,,"",,,,,
and for some of the cases this would successfully work, the data was imported correctly. But sometimes for some reason the csv that is generated is not imported at all or just partially. To check it I tried to use DataGrip to import data from a csv file manually, for some data it was also importing it just partially (like 20 rows out of 1000) and for some data it was not importing at all. I checked the csv's for validity, they were valid.
I think there's a bug in an import part but i don't know where it is and why it is behaving this way. Need help with this.
CodePudding user response:
Create tables:
create table csv_null(id integer not null, fld1 varchar);
insert into csv_null values (1, 'test'), (2, ''), (3, null), (4, 'cat');
create table csv_null_2 as select * from csv_null limit 0;
\pset null
Null display is "NULL".
select * from csv_null;
id | fld1
---- ------
1 | test
2 |
3 | NULL
4 | cat
Python code:
import io
import psycopg2
con = psycopg2.connect(dbname="test", host='localhost', user='postgres', port=5432)
buffer = io.StringIO()
cur = con.cursor()
cur.copy_expert('copy (select * from csv_null ) TO STDOUT WITH CSV HEADER', buffer)
buffer.seek(0)
cur.copy_expert('copy csv_null_2 from STDIN WITH CSV HEADER', buffer)
con.commit()
cur.execute("select * from csv_null_2")
cur.fetchall()
[(1, 'test'), (2, ''), (3, None), (4, 'cat')]
In psql:
select * from csv_null_2 ;
id | fld1
---- ------
1 | test
2 |
3 | NULL
4 | cat
The cur.copy_expert('copy (select * from csv_null ) TO STDOUT WITH CSV HEADER', buffer)
is going to produce output that cur.copy_expert('copy csv_null_2 from STDIN WITH CSV HEADER', buffer)
is going to consume correctly. You are not jumping from one context Python csv
to Postgres COPY
.
UPDATE
Table altered to have NOT NULL
column:
alter table csv_null add column fld2 varchar not null default '';
update csv_null set fld2 = 'not null' where id in (1,4);
alter table csv_null_2 add column fld2 varchar not null default '';
truncate csv_null_2;
Python code:
buffer = io.StringIO()
cur.copy_expert('copy (select * from csv_null ) TO STDOUT WITH CSV HEADER', buffer)
buffer.seek(0)
cur.copy_expert('copy csv_null_2 from STDIN WITH CSV HEADER', buffer)
con.commit()
cur.execute("select * from csv_null_2")
cur.fetchall()
[(2, '', ''), (3, None, ''), (1, 'test', 'not null'), (4, 'cat', 'not null')]
In psql
:
select * from csv_null_2 ;
id | fld1 | fld2
---- ------ ----------
2 | |
3 | NULL |
1 | test | not null
4 | cat | not null