Google company three important papers-CodePudding

Read three papers of Google company, said although the inside of the description in my now to see a lot of all don't understand, but still feel very have feelings, have to say that Google company today could have such a strong vitality, is it a certain sense, the three papers are Google file system, graphs, Big table,
Google file system is a kind of file system, to be able to run on cheap equipment, Big table is a distributed structured storage system, Google used to store a lot of data of the project, I want to chat here, I am some personal point of view of graphs, (because I learned a little python, there are key/value pair, so it is easy to understand)
Graphs model map, reduce two parts, the three big data papers according to Google, graphs programming model principle is: the user custom map function takes an input key/value pair, and then produce a set of intermediate key/value pair, graphs library among all have the same key value I middle value together runnin reduce function, using my own idea is: put a large amount of data segmentation, then each small piece of data processing and the return value, and then combine the value output,
Why need graphs? Because of a single machine can't handle huge amounts of data, or the price is too big, so can not meet the need of requirement, the use of graphs allows developers to focus on business logic, while complex calculations can let this framework to complete, so as to improve the development efficiency, reduce the burden of developers,
The advantages of graphs:
1. Easy to programming
2. Good scalability
3. High fault-tolerance
4. Suitable for PB levels above the big data distributed offline batch
Disadvantages:
1. Hard to real-time computing graphs processing offline data is stored on the local disk)
2. Can't flow calculation graphs design processing of the data is static)
3. Hard to DAG graphs of these parallel computing are mostly based on the data flow model of circulation, that is, a calculation process, the different between computing nodes keep highly parallel, the data flow model for those who need the iteration algorithm of repeated use of a particular data set cannot run efficiently,
Data is money, when I contact with the crawler, I realized after a large amount of data processing analysis, it is possible to get a special useful things, and now after watching the Google company for large data processing, the more realize the crisis of big data, finally wish to keep learning, constantly improve themselves,

CodePudding user response:

Google has always been the spirit of the Internet community leaders
Not in the Google development in Google

CodePudding user response:

Great thanks for sharing