I have a csv with rows that look the following way:
2021-08-20,2021-10-04,2021-10-04,148355456455712,Accountname,USD,"[{'action_type': 'add_to_cart', 'value': '266.63', '1d_click': '266.63', '7d_click': '266.63'}, {'action_type': 'initiate_checkout', 'value': '213.03', '1d_click': '213.03', '7d_click': '213.03'}, {'action_type': 'view_content', 'value': '762.75', '1d_click': '762.75', '7d_click': '762.75'}, {'action_type': 'omni_add_to_cart', 'value': '266.63', '1d_click': '266.63', '7d_click': '266.63'}, {'action_type': 'omni_initiated_checkout', 'value': '213.03', '1d_click': '213.03', '7d_click': '213.03'}, {'action_type': 'omni_view_content', 'value': '762.75', '1d_click': '762.75', '7d_click': '762.75'}, {'action_type': 'add_to_cart', 'value': '266.63', '1d_click': '266.63', '7d_click': '266.63'}, {'action_type': 'initiate_checkout', 'value': '213.03', '1d_click': '213.03', '7d_click': '213.03'}]"
I am trying to load this CSV to a redshift table with the following schema:
Columns Type Nullable Length Precision
date_start varchar true 256 256
date_stop varchar true 256 256
created_time varchar true 256 256
account_id int8 true 19 19
account_name varchar true 256 256
account_currency varchar true 256 256
action_values varchar true 256 256
I'm using the following DML statement:
copy table_name
from 's3://bucket_name/subdirectory/filename.csv'
delimiter ','
ignoreheader 1
csv quote as '"'
dateformat 'auto'
timeformat 'auto'
access_key_id '...'
secret_access_key '...'
;
and i get this error:
Load into table 'table_name' failed. Check 'stl_load_errors' system table for details.
when I look at stl_load_errors table this is what i see:
query substring line value err_reason
93558 ... 2 2021-08-20 Invalid digit, Value '[', Pos 0, Type: Long
93558 ... 2 2021-10-04 Invalid digit, Value '[', Pos 0, Type: Long
93558 ... 2 2021-10-04 Invalid digit, Value '[', Pos 0, Type: Long
93558 ... 2 148355456455712 Invalid digit, Value '[', Pos 0, Type: Long
93558 ... 2 Accountname Invalid digit, Value '[', Pos 0, Type: Long
93558 ... 2 USD Invalid digit, Value '[', Pos 0, Type: Long
I just cant figure out why it isn't working but i guess it has something to do with the json string. Also I can't understand whare this "Type: Long" is coming from.
I am trying to avoid using Json files as input...
can anyone help?
CodePudding user response:
Your data does look reasonable (as far as I can tell) except that the last field (the json data) is longer than 256 characters. This however is not the error you are showing. The format of stl_load_errors isn't the format the table is in so I assume you are doing some processing on this table in what you are showing in your question. The "Type:Long" is referring to the INT8 of your table DDL - INT8 is also known as a Big Int or a Long Int.
I think your issue is that you haven't specified that the COPY is reading a CSV file and the default format is DELIMITED. For Redshift COPY to follow CSV rules you need to specify that the file format is CSV. Specifically I suspect that the double quotes are providing the data value grouping you are expecting.
CodePudding user response:
I worked it out!
since Redshift is parsing every \'
as a string and every \,
as a delimiter if not within a string. the solution is to replace all \'
with \"
within a json string so it would look the following way:
2021-08-20,2021-10-04,2021-10-04,148355456455712,Accountname,USD,'[{"action_type": "add_to_cart", "value": "266.63", "1d_click": "266.63", "7d_click": "266.63"}, {"action_type": "initiate_checkout", "value": "213.03", "1d_click": "213.03", "7d_click": "213.03"}, {"action_type": "view_content", "value": "762.75", "1d_click": "762.75", "7d_click": "762.75"}, {"action_type": "omni_add_to_cart", "value": "266.63", "1d_click": "266.63", "7d_click": "266.63"}, {"action_type": "omni_initiated_checkout", "value": "213.03", "1d_click": "213.03", "7d_click": "213.03"}, {"action_type": "omni_view_content", "value": "762.75", "1d_click": "762.75", "7d_click": "762.75"}, {"action_type": "add_to_cart", "value": "266.63", "1d_click": "266.63", "7d_click": "266.63"}, {"action_type": "initiate_checkout", "value": "213.03", "1d_click": "213.03", "7d_click": "213.03"}]'
since I sed pandas the preprocessing code was the following:
for col in array_cols:
df[col] = df[col].str.replace('\'','\"')