I have several CSVs with varying field names that I am copying into a Postgres database from an s3 data source. There are quite a few of them that contain empty strings, ""
which I would like to convert to NULLs at import. When I attempt to copy I get an error along the lines of this (same issue for other data types, integer, etc.):
psycopg2.errors.InvalidDatetimeFormat: invalid input syntax for type date: ""
I have tried using FORCE_NULL (field 1, field2, field3)
- and this works for me, except I would like to do FORCE_NULL (*)
and apply to all of the columns as I have A LOT of fields I am bringing in that I'd like this applied to.
Is this available?
This is an example of my csv:
"ABC","tgif","123","","XyZ"
CodePudding user response:
Yes force null is possible. You should try with FORCE_NULL
clause. You can refer the syntax and details here:
FORCE_NULL
Match the specified columns' values against the null string, even if it has been quoted, and if a match is found set the value to NULL. In the default case where the null string is empty, this converts a quoted empty string into NULL. This option is allowed only in COPY FROM, and only when using CSV format.
[PostgresPro Doc][1]
[Postgres Doc][2]
[1]: https://postgrespro.com/docs/postgrespro/14/sql-copy
[2]: https://www.postgresql.org/docs/current/sql-copy.html
CodePudding user response:
Using psycopg2
Copy functions. In this case copy_expert:
cat empty_str.csv
1, ,3,07/22/2
2,test,4,
3,dog,,07/23/2022
create table empty_str_test(id integer, str_fld varchar, int_fld integer, date_fld date);
import psycopg2
con = psycopg2.connect("dbname=test user=postgres host=localhost port=5432")
cur = con.cursor()
with open("empty_str.csv") as csv_file:
cur.copy_expert("COPY empty_str_test FROM STDIN WITH csv", csv_file)
con.commit()
select * from empty_str_test ;
id | str_fld | int_fld | date_fld
---- --------- --------- ------------
1 | | 3 | 2022-07-22
2 | test | 4 |
3 | dog | | 2022-07-23
From here COPY:
NULL
Specifies the string that represents a null value. The default is \N (backslash-N) in text format, and an unquoted empty string in CSV format. You might prefer an empty string even in text format for cases where you don't want to distinguish nulls from empty strings. This option is not allowed when using binary format.
copy_expert
allows you specify the CSV format. If you use copy_from
it will use the text format.