But why do you want to block?
I thought is for the sake of concurrent reading speed, but the hadoop authority guidelines, each block is read in order, a read, read the next, that is no concurrent reads,
Then I guess block purpose should be the following
1. In order to map and reduce calculation is convenient, block has a unified calculating unit;
2. Is probably set up a data storage unit, like Windows block;
That was all I could think, I do not know right, you think?
CodePudding user response:
has concerned, can produce posted, hey heyCodePudding user response:
Actually should be in order to improve the reading speed, because if you don't block, that a large part of time is wasted in the search for files (addressing), rather than in data transmission, not worth itCodePudding user response:
Reply upstairs:Block not improve reading speed, the "big" block can improve reading speed, and if you don't block, read speed is faster, because a file is read directly down, no more block addressing time consuming,
CodePudding user response:
My feeling is related to the design principle of HDFS, the designing principles of HDFS is data addressing the time account for 1% of the transfer time, addressing time is generally 10 ms, then transfer time is 1 s, the current disk transfer rate generally is 100 MB/s, then a storage unit is 100 m, approximate to 128, block size can be adjusted according to the rate of diskCodePudding user response: