Home > database >  The GFS paper after reading comprehension
The GFS paper after reading comprehension

Time:09-22

As paper of GFS, inevitably is the summary of the existing function, improvement of part is not enough, the lack of features, and its practicability and applicability are also the important thesis quality judgment standards,
Before don't look at the paper, the file itself, concurrent, data consistency, fault tolerance, hang up to the machine, the disk is damaged, the system error, a single point of failure, the data may be produced under the concurrent overlap is must consider things,
This article emphatically from the fault tolerance, scalability, data storage, cluster is stored up for discussion, zen,
Tolerance: component failure is one of the biggest challenges encountered when designing GFS, but the stability of the machine and the reliability of the hard drive does not give an absolute guarantee, but about the consequences of component failure and its influence is often fatal, led directly to the system does not work, so the GFS built-in tool fault diagnosis system, GFS diagnostic tool is by saving the log event tracking analysis to repeat, but for the GFS diagnostic tools specific operations will account for a lot of space capacity still need further discussion,
Scalability: what is the scalability? In a nutshell is to do more, this paper through the idea of a single Master node without any cost, avoid the client and the Master node might happen in the communication, choose the Chunk size 64 MB reduces the workload, reduce the network load reduced the Master node needs to be saved by the number of metadata, and so on to increase the scalability of GFS, but at the same time it still has the inertia space allocation problem, and this paper also is good enough to put forward the idea that allow the client to read data from the other client, however, allows the client to read data from the other client will have privacy problem and will be a huge challenge,
Data storage: metadata is data, the central plains and in the master server data stored in the memory, make the master server operating speed is very fast, although there is the chunk number, and the bearing capacity of the whole system, the memory size is restricted with the master server have problem, but in general, will outweigh the disadvantages, in which the proposed operation log record metadata, file, the chunk of permanent identity, even after the failure of the master server can repeat itself, by manipulating the log recovery, has important role in the safety of the storage lasting, the GFS guarantee mechanism is to ensure data accuracy, consistency and independence through the chunk server maintenance checksum to verify their saving data is damaged,
Google file system shows a use common hardware support for large-scale data processing system characteristics, its design, optimizing the network protocol stack for each client to write throughput will increase the current limit, big data will have unlimited potential,
  • Related