Cloud computing is Tony-CodePudding

Before, without having access to cloud computing is the concept of cloud computing have a little fuzzy, think this is a very tall, seems to be far away from our also, after the introduction of the teacher and the suggestion, I read the Google's big three papers, dare not say understand, at least huluntianzao palate, with a great many look not to understand the theory of right now my shallow understanding, simple talk about Google File System

First, we need to know that Google GFS is a file system, is a large-scale data-intensive application oriented, scalable distributed file system, although GFS run on cheap general hardware devices, but it is still the ability to provide disaster redundancy and provides a high performance of a large number of client service,
So he is a kind of file system, first he will have a distributed file system with those of the past many of the same target, but, for the GFS design or to load on the application of the own situation and based on the analysis of the technical environment, whether now or in the future, GFS and early idea of distributed file system has obvious different,

A design thinking on different
1. Component failure is considered to be the norm, rather than an accident,
2. According to the traditional standard, the file is very big, in the design of the parameters of the operation, the size of the block must rethink, the management of the large file must be able to achieve efficient, for small file must also support, but don't have to be optimized,
3. Most of the file is updated by adding new data, rather than change the existing data,
4. The work mainly consists of two kinds of read operation: with respect to the manner of a large amount of data flow and for a small amount of data read operation way of random read
5, workload and contains many of large amounts of data and continuous, add data to the file write operation, written by the data of a similar size and read once written, little changed file, the random position write also support for a small amount of data, but don't have to be very efficient,
6, the system must be efficient implementation defined in good condition of a large number of customers at the same time, in the same file to add operation semantics,

Second, the structure system
A GFS cluster contains a single Master node (Master), more than one Chunk server (Chunkserver), and at the same time by multiple Client (Client) access,
Master: metadata management, overall coordination system activity
ChunkServer: storage maintenance data blocks, reading and writing data file
Client: ask the Master for metadata, and according to the corresponding ChunkServer metadata access Chunk of
Three, GFS need to focus on a few main problem
(1) the system is composed of many cheap ordinary components, component failure is the norm, system must continue to monitor its own state, it must be a component failure, as a kind of norm, able to quickly detect, redundancy and restore failure component
(2) system to store a certain number of large files, in the usual standards, our file is very big, the management of hundreds of millions of a KB is very unwise to small file size, therefore, to design the assumptions and parameters, such as I/O operations and the size of the Block are need to rethink

Four, revelation
Both hardware design and software design, a good system should be a widely applicability, high performance, high reliability, in order to achieve the purpose of such a is not easy, we in addition to the unremitting effort, to improve the performance of a single component, also can consider through scientific task allocation, make a low cost components cluster share risks, especially in the current hardware technology is close to the limit, performance requirements of a rising tide lifts all boats at present, with the number of guaranteed quality, guaranteed performance by probability can yet be regarded as a reliable choice,