Our environment/versions hadoop 3.2.3 hive 3.1.3 spark 2.3.0
our internal table in hive is defined as
CREATE TABLE dw.CLIENT
(
client_id integer,
client_abbrev string,
client_name string,
effective_start_ts timestamp,
effective_end_ts timestamp,
active_flag string,
record_version integer
)
stored as orc tblproperties ('transactional'='true');
external as
CREATE EXTERNAL TABLE ClientProcess_21
( ClientId string, ClientDescription string, IsActive string, OldClientId string, NewClientId string, Description string,
TinyName string, FinanceCode string, ParentClientId string, ClientStatus string, FSPortalClientId string,)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE LOCATION '.../client_extract_20220801.csv/' TBLPROPERTIES ("skip.header.line.count"="1")
I can select from both tables.
the internal table is empty, when I try joining them
select
null, s.*
from ClientProcess_21 s
join dw.client t
on s.ClientId = t.client_id
Hive is failing with
SQL Error [3] [42000]: Error while processing statement: FAILED: Execution Error, return code 3 from org.apache.hadoop.hive.ql.exec.spark.SparkTask. Spark job failed during runtime. Please check stacktrace for the root cause. partial stack trace from the Hive log 2022-08-01T18:53:39,012 INFO [RPC-Handler-1] client.SparkClientImpl: Received result for 07a38056-5ba8-45e0-8783-397f25f398cb 2022-08-01T18:53:39,219 ERROR [HiveServer2-Background-Pool: Thread-1667] status.SparkJobMonitor: Job failed with java.lang.NoSuchMethodError: org.apache.orc.OrcFile$WriterOptions.useUTCTimestamp(Z)Lorg/apache/orc/OrcFile$WriterOptions; at org.apache.hadoop.hive.ql.io.orc.OrcFile$WriterOptions.useUTCTimestamp(OrcFile.java:286) at org.apache.hadoop.hive.ql.io.orc.OrcFile$WriterOptions.(OrcFile.java:113) at org.apache.hadoop.hive.ql.io.orc.OrcFile.writerOptions(OrcFile.java:317) at org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat.getOptions(OrcOutputFormat.java:126) at org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat.getHiveRecordWriter(OrcOutputFormat.java:184) at org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat.getHiveRecordWriter(OrcOutputFormat.java:61) at org.apache.hadoop.hive.ql.exec.Utilities.createEmptyFile(Utilities.java:3458) at org.apache.hadoop.hive.ql.exec.Utilities.createDummyFileForEmptyPartition(Utilities.java:3489) at org.apache.hadoop.hive.ql.exec.Utilities.access$300(Utilities.java:222) at org.apache.hadoop.hive.ql.exec.Utilities$GetInputPathsCallable.call(Utilities.java:3433) at org.apache.hadoop.hive.ql.exec.Utilities.getInputPaths(Utilities.java:3370) at org.apache.hadoop.hive.ql.exec.spark.SparkPlanGenerator.cloneJobConf(SparkPlanGenerator.java:318) at org.apache.hadoop.hive.ql.exec.spark.SparkPlanGenerator.generate(SparkPlanGenerator.java:241) at org.apache.hadoop.hive.ql.exec.spark.SparkPlanGenerator.generate(SparkPlanGenerator.java:113) at org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient$JobStatusJob.call(RemoteHiveSparkClient.java:359) at org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:378) at org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:343) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750) java.lang.NoSuchMethodError: org.apache.orc.OrcFile$WriterOptions.useUTCTimestamp(Z)Lorg/apache/orc/OrcFile$WriterOptions; at org.apache.hadoop.hive.ql.io.orc.OrcFile$WriterOptions.useUTCTimestamp(OrcFile.java:286) at org.apache.hadoop.hive.ql.io.orc.OrcFile$WriterOptions.(OrcFile.java:113) at org.apache.hadoop.hive.ql.io.orc.OrcFile.writerOptions(OrcFile.java:317) at org.apache.hadoop.hive.q
******* update DMLs on tables defined as ..stored as orc tblproperties ('transactional'='true');
are failing with 2022-08-02 09:47:42 ERROR SparkJobMonitor:1250 - Job failed with java.lang.NoSuchMethodError: org.apache.orc.OrcFile$WriterOptions.useUTCTimestamp(Z)Lorg/apache/orc/OrcFile$WriterOptions; java.util.concurrent.ExecutionException: Exception thrown by job ,, Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, 10.222.108.202, executor 0): java.lang.RuntimeException: Error processing row: java.lang.NoSuchMethodError: org.apache.orc.OrcFile$WriterOptions.useUTCTimestamp(Z)Lorg/apache/orc/OrcFile$WriterOptions; at org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.processRow(SparkMapRecordHandler.java:149) .. Caused by: java.lang.NoSuchMethodError: org.apache.orc.OrcFile$WriterOptions.useUTCTimestamp(Z)Lorg/apache/orc/OrcFile$WriterOptions; at org.apache.hadoop.hive.ql.io.orc.OrcFile$WriterOptions.useUTCTimestamp(OrcFile.java:286)
CodePudding user response:
I think this is related to data type conversation when joining. One join col is string and other is int. Can you please try this
select
null, s.*
from ClientProcess_21 s
join dw.client t
on s.ClientId = cast(t.client_id as string) -- cast it to string
CodePudding user response:
resolved by copying orc jars to spark home cp $HIVE_HOME/lib/orc $SPARK_HOME/jars/ cp $HIVE_HOME/hive-storage-api-2.7.0.jar $SPARK_HOME/jars/