Big data book report-CodePudding

Google's big data algorithm based
GFS
Into the era of information explosion, people can get data growth, as the increase of the traditional hard disk number not only failed to meet the data storage, speed and capacity increase costs caused a series of problems, such as scalable distributed file system can well solve the problem of storage, also changed the way data storage management, it USES a client-server model nodes connected by computer to store a way, through the file segmentation with redundancy in storage after the repeat some part configuration to reduce the failure due to component problem, improve the reliability, the main way by a master in memory, it can simplify the system results to improve its performance but because there is only one, may be a single point of failure is difficult to breakthrough, commend used to start is by multiple chunkserver, used to improve throughput, master performance computer network is the first node Linux computer, it is a reliable security and stability of platform more free operating systems to run user level server, GFS is superior to the other way of storage is in cheap commercial machine cluster system, reduce the cost and reliability,
Graphs
Big data distributed computing method to process and produce large-scale series, lisp based on map for reference reduce as the criterion, the map can be highly parallel operation, the application of high performance requirements as well as the demand is very useful in the field of parallel computing, it also greatly convenient for some programmers to distributed system program is run in distributed parallel programming map reference multiple workers in different operation after the merger, reduce is used to ensure that all the keys of the map for each of the share the same key value, graphs of data set through the large-scale operation of distribution to each node on the network to realize the reliability, to remember the large-scale distribution network operation data each node, main function is to (1) data partitioning and computing task scheduling (2) data/code 3) positioning system optimization 4) each error detection and recovery
BigTable
BigTable is Google design of distributed data storage system, used to handle huge amounts of data of a non-relational database, is a large yu-jong fault and the characteristics of the autonomy system builds on the GFS and graphs, and distributed structured data storage processing mass data petabytes of data, easy to expand, highly efficient support dynamic scalable and suitable for cheap equipment is versatile for a read operation, no shape operation is not applicable to the traditional relational database, it adopt the way of the master server and server will assign the Tablet to the Tablet server, testing new and outdated Tablet servers, balance the load between the Tablet server, GFS recycle junk files, the data model of change, to obtain good load balancing

CodePudding user response:

Feel very much like