Home >
other > Google three important papers of substance
Google three important papers of substance
Recently read a Google three important papers, get some feeling,
Since the 21st century, with access to the Internet, more and more data generated by the network more and more, people are faced with two key issues: 1, how huge amounts of data storage? 2, the huge amounts of data how to calculate? And Google company from 2003 to 2006 published three papers are in order to solve two problems provides a train of thought, blew up big data era,
GFS文件系统是一个可扩展的分布式文件系统,用于大型的,分布式的,对大量数据进行访问的应用,GFS由一个master和大量的chunkserver构成,Google设置一个master来保存目录和索引信息,这是为了简化系统果,提高性能来考虑的,但是这就会造成主成为单点故障或者瓶颈,为了消除主的单点故Google把每个chunk设置的很大,这样,由于代码访问数据的本地性,application端和master的交互会减少,而主要数据流量都是Application和chunkserver之间的访问,GFS不仅满足了人们对内存的需求,而且该系统还将文件管理得有序不乱,该系统在现在的电脑中应用范围泛,它也有许多的优点,其中GFS将整个系统的节点分为三类角色:客户端,主服务器和数据块服务器,它的特点也十分具有特色,如1.GFS实现了数据流和控制流的分离,Client和Master之间只有控制流,没有数据流,极大地降低了Master的负载,Client和Chunk Server之间直接传输数据流,同时由于文件被分为多个Chunk进行分布式存储,Client可以同时访问多个Chunk Server,从而使整个系统的IO高度并行,整体性能得到提高,2.采用中心服务器模式:(1)可以方便的操作Chunk Server(2)Master可以掌握系统内所有Chunk Server的情况,方便进行负载均衡(3)不存在元数据的一致性问题3.无论是客户端还是chunk服务器都不需要缓存文件数据:(1)文件操作大部分是流式读写,不存在大量重复的读写(2)Chunk Server上的数据存储在本地文件系统上(Linux File System),若真的出现频繁存取,那么本地文件系统的cache也可以支持(3)若建立系统cache,那么cache中的数据与Chunk Server中的数据的一致性很难保证,
Graphs is a programming model for large-scale data sets (greater than 1 TB) parallel computing, as the name suggests to separate Map and Reduce, it implements the main idea is also dependent on the Map (Map) and Reduce (reduction), the Map function is a key/value key/value pair data collection process, Reduce function is a process of merger, BigTable is with no database data is a big table, through the sacrifice of storage space for performance, behind Google's two papers - graphs and BigTable is based on the GFS, three basic core technology to build a complete distributed computing architecture,
As a sophomore, want to fully understand the Google three papers also need long-term accumulation of knowledge, now the most important thing is that through these papers study the thought of big data, horizons, lay a foundation in this respect, oneself also will work harder!