spark 3.2.1 Apache spark table incompatible data type with parquet-CodePudding

Situation:

I get a parquet file generated for me every X amount of time. Can't change the column type of the file, nor parquet schema. Can't modify and rewrite the parquet to a new file location because it has to be picked up from there. Process for generating the parquet file can't/won't be changed.

Using databricks with spark 3.2.1. Trying to create a table that points to the parquet file in (1) using the following code

create database if not exists sampledb;
drop table if exists sampledb.table;
create table sampledb.table (ID BIGINT, Column1 string) 
using parquet
OPTIONS(path='/path/to/parquet/');

I get the following error;

com.databricks.sql.io.FileReadException: Error while reading file ........
Parquet column cannot be converted. Column: [ID], Expected: LongType, Found: INT32

What data-type should i use when specifying the spark table schema so it can read parquet file? I'm open to using scala, py-spark and/or python if needed.

CodePudding user response：

BIGINT is an alias for LongType.

The ID column in the parquet schema is defined as int32.

Use the INT type for the ID column.

drop table if exists sampledb.table;
create table sampledb.table (ID INT, Column1 string) 
using parquet
OPTIONS(path='/path/to/parquet/');