When I spent hours reading when the teacher asked the paper, I think that big data era we the development of science and technology bring about everything we've grown used to the convenience, but never hard to understand the charm, like the teacher said, we are still necessary as college students try to read a professional academic papers,
Although as a person who don't know anything about cloud computing, or have you learned from the paper and as far as possible to understand the GFS general building, so the following mainly from GFS, how to read and write effectively, the master server stability and data recovery and so on introduction to its own reflection,
GFS build, a GFS clusters containing a single master node, more than one chunk server, and at the same time by multiple client access, files are stored in the GFS is divided into fixed size of the chunk, and storage of data in order to facilitate recognition, when to create each chunk, assign a constant, the only global 64 chunk tags, and in the form of Linux file saved on the local hard disk, a single master node will greatly simplify the design of the GFS, because the client need to ask the chunk is found through a master server, the client will these metadata cache after a period of time, the subsequent will directly with the chunk server to read the data, for metadata stored in the memory of the master, there are three main types of metadata, file and the chunk namespace, files and the chunk, the relation between each chunk replica location, is about my own understanding of these types of metadata can read convenient implementation, maintain the master server and the chunk synchronization state, will not lead to the master server crashes,
File read and write, to read and write, efficiency when reading and writing the client to increase the probability of multiple clients to read and write at the same time will increase, reduce the overall efficiency of read and write and it feels like we normally use Internet cable, when more than one person use the same cable, your Internet speed will be slow, put a video game on a card machine, GFS system there are two kinds of read operation, read is a kind of mass flow, mass flow read usually read hundreds of KB of data at a time, or more, and the other one is a small random reads, and for the record in GFS additional operated by the chunk and the copy of the chunk, ensure that at least one atom of write operation on success,
Master server stability and backup data recovery, in order to maintain the reliability of the master server in the GFS, master server status, also want to copy, all the operations master server logs and checkpoint files have been copied to multiple machines, including the operation log is the only persistent storage metadata record, on the master server in disaster recovery, through repeated operation log file system restore to the most recent state, but checkpoint is a database to do a snapshot of the behavior, read checkpoint file and repeat checkpoint after a finite number of log files can recovery system, and in the thesis can learn snapshot of a file or directory trees make a copy, can be instantaneous, and will not cause interference to other operating simultaneously, so through the snapshot, users can create copies of the branch of a huge data set, as a backup, can submit or directly back to copy the current state,
Is after read the papers of GFS shallowly discussed above, is indeed there are many ways in today's knowledge reserve can't fully understand, but it does know a thing or two, learn new knowledge, at the same time also encountered the charm,