Consult a sparksql insert hive table-CodePudding

Code is spark. SQL (" insert into... select... ") structure, the implementation of the jar package found that task is slow, resource points, then looked at the execution, found some task assignment problem, the debug or not, please tell god

CodePudding user response:

O great god!!!!!!!!!!

CodePudding user response:

Why must be inserted into the hive, spark SQL execution after the completion of the keep the results to the hadoop environment;
Hive through external table in the form of a link to the past can now

CodePudding user response:

Registered into a temporary table that DataFrame heavy partitions, reduce the number of partitions, so hive file data in a table is to partition the amount of data, do not have so much small files should be fast

CodePudding user response:

Bulk insert and reduce the debris