Home > Mobile >  Hadoop Spark SQL insert failing
Hadoop Spark SQL insert failing

Time:12-13

I'm trying to insert something around 13M rows into a new table but I'm getting the following error:

22/12/09 19:33:56 ERROR Utils: Aborting task
java.lang.AssertionError: assertion failed: Created file counter 11 is beyond max value 10
    at scala.Predef$.assert(Predef.scala:223)
    at org.apache.spark.sql.execution.datasources.DynamicPartitionDataWriter.$anonfun$increaseCreatedFileAndCheck$1(FileFormatDataWriter.scala:191)
    at scala.runtime.java8.JFunction1$mcVI$sp.apply(JFunction1$mcVI$sp.java:23)
    at scala.Option.foreach(Option.scala:407)
    at org.apache.spark.sql.execution.datasources.DynamicPartitionDataWriter.increaseCreatedFileAndCheck(FileFormatDataWriter.scala:188)
    at org.apache.spark.sql.execution.datasources.DynamicPartitionDataWriter.write(FileFormatDataWriter.scala:277)
    at org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$executeTask$1(FileFormatWriter.scala:280)
    at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1473)
    at org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:288)
    at org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$write$15(FileFormatWriter.scala:211)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
    at org.apache.spark.scheduler.Task.run(Task.scala:131)
    at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:498)
    at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:501)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
22/12/09 19:33:57 ERROR FileFormatWriter: Job job_202212091917352650741377131539872_0020 aborted.
22/12/09 19:33:57 ERROR Executor: Exception in task 0.1 in stage 20.0 (TID 26337)
org.apache.spark.SparkException: Task failed while writing rows.
    at org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:298)
    at org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$write$15(FileFormatWriter.scala:211)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
    at org.apache.spark.scheduler.Task.run(Task.scala:131)
    at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:498)
    at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:501)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.AssertionError: assertion failed: Created file counter 11 is beyond max value 10
    at scala.Predef$.assert(Predef.scala:223)
    at org.apache.spark.sql.execution.datasources.DynamicPartitionDataWriter.$anonfun$increaseCreatedFileAndCheck$1(FileFormatDataWriter.scala:191)
    at scala.runtime.java8.JFunction1$mcVI$sp.apply(JFunction1$mcVI$sp.java:23)
    at scala.Option.foreach(Option.scala:407)
    at org.apache.spark.sql.execution.datasources.DynamicPartitionDataWriter.increaseCreatedFileAndCheck(FileFormatDataWriter.scala:188)
    at org.apache.spark.sql.execution.datasources.DynamicPartitionDataWriter.write(FileFormatDataWriter.scala:277)
    at org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$executeTask$1(FileFormatWriter.scala:280)
    at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1473)
    at org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:288)

The insert operation is like the following:

insert overwrite table fake_table_txt partition(partition_name)
select id, name, type, description from ( inner query )

I'm a Hadoop beginner and I have no idea what may be causing this. Could anybody please give me any direction?

CodePudding user response:

After struggling a little, I was told that increasing the property "files per task" would do the trick.

set spark.sql.maxCreatedFilesPerTask = 15;

It was defaulted to 10 previously.

  • Related