(the block size can be adjusted, I know)
I will give the answer is:
(1) in order to make the file transfer time is much larger than the disk address, transfer files blocks of time depends on the time, not addressing time,
(2) if the block is too large, also doesn't fit, the file blocks is too large, data processing on a single node of the time is too long, on the other hand, the task of the whole cluster system will be introduced, that assigned to the task node too little, so can't make full use of the cluster, any cluster utilization,
(3) if the block is too small, it will produce a large number of tasks, and in the process of map produced by the intermediate results also increased substantially, thus increased the datanode and the communication overhead between the namenode, is unfavorable to the cluster,
(4) the Hadoop early set to 64 m is considering the hardware compatibility issues, now with the ascension of the hardware, can be set to a larger block size,
At four o 'clock I answered this, but the teacher has been are not satisfied, let continue thinking,
Have thought for a long time, what the reason, why is mainly set to 64, this number?
O people, thank you very much!
CodePudding user response:
Too bad, too small is not goodCodePudding user response:
Can you explain reasons, the best answer from the technical point of view, is really thank youCodePudding user response:
Do you have a tutorial, or details to see?