Home > Enterprise >  Specify row delimiter in Redshift COPY command
Specify row delimiter in Redshift COPY command

Time:12-13

I am trying to use the COPY command to import data into Redshift. Unfortunately the data is not sanitized very well and there are CRLF characters in some of the data. This is causing an error because it thinks it is a new record.

I am already using the DELIMITER parameter, but that is setting the delimiter for the fields in each record. Is there a similar way to specify what character(s) are delimiting each record?

CodePudding user response:

No. Redshift expects \n (0x0A) as the End of Record (EOF) and doesn't handle CRLF (0x0D 0x0A). I believe it just sees the CR as another piece of input data but this info cannot be inserted into anything other than a varchar column. If you lines just have CR (0x0D) Redshift won't see an EOF at all and combine rows.

You will need to cleanse your data to remove the CR characters. Each record needs to end with a newline NL (0x0A). (Yes, LF and NL are the same ascii code and just have different names in different applications.) Hopefully you can just remove the CR's but I've seen data with just CR as the EOF and you will need to change these to NL in this case not just remove them.

If your last column of data is a varchar then you can (I believe) just strip the CR character from these strings after it is loaded into Redshift. Otherwise you data needs to be fix before it enters Redshift.

  • Related