Situation:
I get a parquet file generated for me every X amount of time. Can't change the column type of the file, nor parquet schema. Can't modify and rewrite the parquet to a new file location because it has to be picked up from there. Process for generating the parquet file can't/won't be changed.
Using databricks with spark 3.2.1. Trying to create a table that points to the parquet file in (1) using the following code
create database if not exists sampledb; drop table if exists sampledb.table; create table sampledb.table (ID BIGINT, Column1 string) using parquet OPTIONS(path='/path/to/parquet/');
I get the following error;
com.databricks.sql.io.FileReadException: Error while reading file ........ Parquet column cannot be converted. Column: [ID], Expected: LongType, Found: INT32
What data-type should i use when specifying the spark table schema so it can read parquet file? I'm open to using scala, py-spark and/or python if needed.
CodePudding user response:
BIGINT
is an alias for LongType
.
The ID column in the parquet schema is defined as int32
.
Use the INT
type for the ID column.
drop table if exists sampledb.table;
create table sampledb.table (ID INT, Column1 string)
using parquet
OPTIONS(path='/path/to/parquet/');