Home > Software design >  Parse error with a simple CSV import into Clickhouse
Parse error with a simple CSV import into Clickhouse

Time:04-21

I have an extremely simple CSV I'm trying to import into clickhouse with no success. The create table statement is:

CREATE TABLE staging.EloLBK
(
    `Month` DateTime64(3),
    `1958` Int32,
    `1959` Int32,
    `1960` Int32
)
ENGINE = MergeTree
PRIMARY KEY Month
ORDER BY Month
SETTINGS index_granularity = 8192

The CSV data looks like this:

"Month", "1958", "1959", "1960"
"JAN",  340,  360,  417
"FEB",  318,  342,  391
"MAR",  362,  406,  419
"APR",  348,  396,  461
"MAY",  363,  420,  472
"JUN",  435,  472,  535
"JUL",  491,  548,  622
"AUG",  505,  559,  606
"SEP",  404,  463,  508
"OCT",  359,  407,  461
"NOV",  310,  362,  390
"DEC",  337,  405,  432

My import statement is:

INSERT INTO EloLBK SELECT * FROM file('EloLBK/*.csv', 'CSVWithNames', '"Month" datetime64, "1958" integer, "1959" integer, "1960" integer')

The error coming back from clickhouse is:

Code: 27. DB::Exception: Cannot parse input: expected '"' before: '417\n"FEB",  318,  342,  391\n"MAR",  362,  406,  419\n"APR",  348,  396,  461\n"MAY",  363,  420,  472\n"JUN",  435,  472,  535\n"JUL",  491,  548,  622\n"AUG",  505,':
Row 1:
Column 0,   name: Month, type: DateTime64(3), parsed text: "<DOUBLE QUOTE>JAN<DOUBLE QUOTE>,  340,  360,  "ERROR
Code: 27. DB::ParsingException: Cannot parse input: expected '"' before: '417\n"FEB",  318,  342,  391\n"MAR",  362,  406,  419\n"APR",  348,  396,  461\n"MAY",  363,  420,  472\n"JUN",  435,  472,  535\n"JUL",  491,  548,  622\n"AUG",  505,'. (CANNOT_PARSE_INPUT_ASSERTION_FAILED) (version 21.12.1.8928 (official build))

: While executing CSVRowInputFormat: While executing File. (CANNOT_PARSE_INPUT_ASSERTION_FAILED)

I'm not sure how to resolve this so any advice would be appreciated!

CodePudding user response:

Ok, after some more messing around, it appears the error message is just a bit misleading. The actual issue is that clickhouse is (understandably) unable to parse the month as a datetime.

The following CSV input works fine:

"Month","1958","1959","1960"
"1970-01-01T00:00:00","340","360","417"
"1970-01-02T00:00:00","318","342","391"

CodePudding user response:

You can tell ClickHouse to do a best effort guess by passing the option date_time_input_format='best_effort', for example:

INSERT INTO EloLBK SELECT * FROM file('EloLBK/*.csv', 'CSVWithNames', '"Month" datetime64, "1958" integer, "1959" integer, "1960" integer') settings date_time_input_format='best_effort';

Will lead to:

┌───────────────────Month─┬─1958─┬─1959─┬─1960─┐
│ 2000-01-01 00:00:00.000 │  340 │  360 │  417 │
│ 2000-02-01 00:00:00.000 │  318 │  342 │  391 │
│ 2000-03-01 00:00:00.000 │  362 │  406 │  419 │
│ 2000-04-01 00:00:00.000 │  348 │  396 │  461 │
│ 2000-05-01 00:00:00.000 │  363 │  420 │  472 │
│ 2000-06-01 00:00:00.000 │  435 │  472 │  535 │
│ 2000-07-01 00:00:00.000 │  491 │  548 │  622 │
│ 2000-08-01 00:00:00.000 │  505 │  559 │  606 │
│ 2000-09-01 00:00:00.000 │  404 │  463 │  508 │
│ 2000-10-01 00:00:00.000 │  359 │  407 │  461 │
│ 2000-11-01 00:00:00.000 │  310 │  362 │  390 │
│ 2000-12-01 00:00:00.000 │  337 │  405 │  432 │
└─────────────────────────┴──────┴──────┴──────┘
  • Related