I'm copying a CSV file into Postgres using SQL commands, and the file has two different possibilities for what I'd like to treat as null. The first is just a blank (two sequential commas). The second is the string "PrivacySuppressed".
I've tried importing the CSV like this:
COPY c2c.field FROM 'my_csv.csv' NULL AS 'PrivacySuppressed' HEADER csv;
The problem here is that although PrivacySuppressed is converted to null, the blank is treated as a blank string, and I get the following error:
ERROR: invalid input syntax for type integer: ""
How can you import a CSV like this that has multiple NULL possibilities? I could modify the original CSV by just stripping out the PrivacySuppressed string, but I'd like to keep this in the data (so that if we decide to change the application we'll still have access to it).
CodePudding user response:
An example case:
cat csv_null_test.csv
1,,9
2,"19","5"
3,"PrivacySuppressed","5"
4,"19","15"
5,"19","5"
create table csv_null_test(id integer, fld_1 varchar, fld_2 integer);
\copy csv_null_test from 'csv_null_test.csv' with(format 'csv');
COPY 5
\pset null
Null display is "NULL".
select * from csv_null_test ;
id | fld_1 | fld_2
---- ------------------- -------
1 | NULL | 9
2 | 19 | 5
3 | PrivacySuppressed | 5
4 | 19 | 15
5 | 19 | 5
(5 rows)
update csv_null_test set fld_1 = nullif(fld_1, 'PrivacySuppressed') where fld_1 = 'PrivacySuppressed' ;
UPDATE 1
select * from csv_null_test ;
id | fld_1 | fld_2
---- ------- -------
1 | NULL | 9
2 | 19 | 5
4 | 19 | 15
5 | 19 | 5
3 | NULL | 5