Home > Enterprise >  Create table Athena ignore comma in the row values
Create table Athena ignore comma in the row values

Time:11-03

I am creating a table in Athena using below scripts

CREATE EXTERNAL TABLE `itcfmetadata`(
  `itcf id` string, 
  `itcf control name` string, 
  `itcf control description` string, 
  `itcf process` string, 
  `standard` string, 
  `controlid` string, 
  `threshold` string, 
  `status` string, 
  `date reported` string, 
  `remediation (accs specific)` string, 
  `aws account id` string, 
  `aws resource id` string, 
  `aws account owner` string)
ROW FORMAT DELIMITED 
  FIELDS TERMINATED BY ',' 
  ESCAPED BY '\\'
  LINES TERMINATED BY '\n'
LOCATION
  's3://itcfmetadata/'
TBLPROPERTIES (  
  'skip.header.line.count'='1');

The S3 source file is csv file. This file is converted from a excel file and this csv file doe snot have comma seperated values, it is more like a excel file. Problem is when any column contains text like "Hi, How are you". It get split into two as there is a comma and "Hi" and "How are you" becomes two value and get split into two rows. How to avoid this using above create scripts ?

CSV File :

enter image description here

CodePudding user response:

Try using

ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'

instead of DELIMITED

The DELIMITED deserializer just looks at the delimiters you provide. The csv deserializet will only use those outside a pair of double quotes ".

See the docs: https://docs.aws.amazon.com/athena/latest/ug/csv-serde.html

  • Related