We are using Apache IoTDB Server in Version 0.11.2 in a scenario and observe a data directory / tsfiles that are way bigger than they should be (about 130 sensors with 4 Million double values for each sensor but the files are about 200gb).
Are there known issues or do you have any ideas what could cause this is how to track that down?
The only thing we could think off could be some merge artefacts as we do write many datapoints out of order so merging has to happen frequently.
Are there any ideas or tools on how to debug / inspect the tsfiles to get an idea whats happening here?
Any help or hints is appreciated!
CodePudding user response:
this may be due to the compaction strategy.
You could fix this in two ways (no need at the same time):
(1) upgrade to 0.12.2 version
(2) open the configuration in iotdb-engine.properties: force_full_merge=true
The reason is:
The unsequenced data compaction in the 0.11.2 version has two strategies.
E.g.,
Chunks in a sequence TsFile: [1,3], [4,5]
Chunks in a unsequence TsFile: [2]
(I use [1,3] to indicate the timestamp of 2 data points in a Chunk)
(1) When using full merge(rewrite all data): we get a tidy sequence file: [1,2,3],[4,5]
(2) However, to speed up the compaction, we use append merge by default, when we get a sequence TsFile: [1,3],[4,5],[1,2,3]. In this TsFile, [1,3] does not have metadata at the end of the File, it is garbage data.
So, if you have lots of out-of-order data merged frequently, this will happen (get a verrry big TsFile).
The big TsFile will be tidy after a new compaction.
You could also use TsFileSketchTool or example/tsfile/TsFileSequenceRead to see the structure of the TsFile.