Increment of daily business data archive, that is, insert into the table operations, the implementation of the hive existing hive every time insert will not file append to the file before, but has become a new problem, for example:
Insert file before: 000000 _0
After insert file: 000000 _0
000000 _0_copy_1
Insert more for many times,
Official provided on how to merge these small file configuration is as follows:
& lt; Property>Hive. Merge. Mapfiles true
Hive. Merge. Mapredfiles true Hive. Merge. Smallfiles. Avgsize 134217728 & lt;/value>
But there is no work, file number or continue to accumulate
There are other configuration for guidance?
CodePudding user response:
Create a temporary table as an intermediary, and then the join to do incremental don't know will not solve the problemCodePudding user response:
Solved? With oCodePudding user response:
Build a PK PK table to save the new data every day, before each insert, PK is delete old data, insert new data