I'm trying to insert something around 13M rows into a new table but I'm getting the following error:
22/12/09 19:33:56 ERROR Utils: Aborting task
java.lang.AssertionError: assertion failed: Created file counter 11 is beyond max value 10
at scala.Predef$.assert(Predef.scala:223)
at org.apache.spark.sql.execution.datasources.DynamicPartitionDataWriter.$anonfun$increaseCreatedFileAndCheck$1(FileFormatDataWriter.scala:191)
at scala.runtime.java8.JFunction1$mcVI$sp.apply(JFunction1$mcVI$sp.java:23)
at scala.Option.foreach(Option.scala:407)
at org.apache.spark.sql.execution.datasources.DynamicPartitionDataWriter.increaseCreatedFileAndCheck(FileFormatDataWriter.scala:188)
at org.apache.spark.sql.execution.datasources.DynamicPartitionDataWriter.write(FileFormatDataWriter.scala:277)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$executeTask$1(FileFormatWriter.scala:280)
at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1473)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:288)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$write$15(FileFormatWriter.scala:211)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:131)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:498)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:501)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
22/12/09 19:33:57 ERROR FileFormatWriter: Job job_202212091917352650741377131539872_0020 aborted.
22/12/09 19:33:57 ERROR Executor: Exception in task 0.1 in stage 20.0 (TID 26337)
org.apache.spark.SparkException: Task failed while writing rows.
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:298)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$write$15(FileFormatWriter.scala:211)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:131)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:498)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:501)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.AssertionError: assertion failed: Created file counter 11 is beyond max value 10
at scala.Predef$.assert(Predef.scala:223)
at org.apache.spark.sql.execution.datasources.DynamicPartitionDataWriter.$anonfun$increaseCreatedFileAndCheck$1(FileFormatDataWriter.scala:191)
at scala.runtime.java8.JFunction1$mcVI$sp.apply(JFunction1$mcVI$sp.java:23)
at scala.Option.foreach(Option.scala:407)
at org.apache.spark.sql.execution.datasources.DynamicPartitionDataWriter.increaseCreatedFileAndCheck(FileFormatDataWriter.scala:188)
at org.apache.spark.sql.execution.datasources.DynamicPartitionDataWriter.write(FileFormatDataWriter.scala:277)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$executeTask$1(FileFormatWriter.scala:280)
at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1473)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:288)
The insert operation is like the following:
insert overwrite table fake_table_txt partition(partition_name)
select id, name, type, description from ( inner query )
I'm a Hadoop beginner and I have no idea what may be causing this. Could anybody please give me any direction?
CodePudding user response:
After struggling a little, I was told that increasing the property "files per task" would do the trick.
set spark.sql.maxCreatedFilesPerTask = 15;
It was defaulted to 10 previously.