I am importing a pipe-delimited file from an S3 bucket into Redshift using the COPY command.
I am getting this error: Invalid digit, Value '.', Pos 0, Type: Integer
When I query stl_load_errors I notice that the line_text and field_text fields are as follows:
line_text: ..1
field_text: ..1.
This makes no sense because this is the line in the file:
1|zzDo not use-P|P
I have looked at the file in notepad to show all the characters and I don't see anything odd. I am really confused as to why Redshift is reading this file all wrong.
EDIT: I am using BCP to output the data from a SQL Server table to the pipe delimited file if that matters
CodePudding user response:
I suspect you are dealing with a text encoding issue. You are coming from SQL server so non-UTF-8 encodings (UTF16) are possible (likely) - have you checked the encoding of the file? (Those dots look to represent characters that could not be parsed). There is an option to BCP to output in UTF-8.
See: https://learn.microsoft.com/en-us/answers/questions/184641/using-bcp-export-in-utf-8-format.html
Redshift only reads multi-byte UTF-8. UTF-16 encodings are seen as 2 characters and the first one (and possibly the second one) make no sense. Outputting in UTF-8 should fix this.
However, you MAY run into a second issue - the end-of-record. MS tools like to end lines with CR AND LF while unix/linux tools like just CR. You may get some LF characters in your fields so you will want to check your data after loading. This may not be an issue for you but I'd hate to see you get tripped up by another common MS to Redshift issue.