Home > other >  Read Google's three big paper inductive
Read Google's three big paper inductive

Time:10-27


Google's three most important papers are Big table distributed structured data storage System, the Google File System and graphs, to the three papers I watched, only rough because of the large number of professional term so I'm not particularly understand, is illustrated,
First is the Big table structure of the distributed storage system, as the name structure is similar to a tree, first, the trunk is branch again, into every tiny units, the thesis USES a Chubby file example, divided into four levels, the relationship between the adjacent two levels like mapping, single in the trunk of a Big, can be mapped out in the trunk of several small, and the number of uncertain, Big trunk and small between the trunk and contact is based on the location information, he has a certain limit, that is the Tablet location information can't more than three layers, talk about the API system support c + + language, has similarities with the c language, application convenient, later in the Big table component of Shared machine concept has attracted me, because of his process with the other, the difference between the how to? 不仅如此通过集群系统调度文件和处理机器故障,一起监视机器的行为如同一个反馈系统,但明显要高级许多,在存储方面SSTable格式显得相对僵硬,但是持久化和不可更改保证数据的稳定性和可靠性,在寻找时候由于key和value是一一对应的,可以一次搜索完,因此又有灵活便利之处,不仅是在但个小文件里,相对于主干的Chubby文件服务5个活动副本,多数正常运行才可以运用,之间又有联系,还可以保证副本一致性,可能是对于程序了解不多,所以觉得不可思议,主文件通过算法矫正副本不同和自然界负反馈一样神奇,由于又是商业性质,且存储资源是有限的,数据量又巨大,既要补充,又要删除,客户程序和服务会话也是一种一一对应,以租约为纽带,时间为界限保证了巨大复杂又不失灵活,此外只有一个master文件为主,管理全部下级user table而在master不响应下级文件系统又会让其降级,重新让最为活跃的table升为master管理,灵活的响应机制保证公平,不会出现主管文件坏了便不能正常运用,设计人思路严谨灵活,考虑周全合理简单,逻辑条理和程序可执行化是使理念得以实现直接原因,优化中的局部性群组选择性缩短量以省时间,压缩省空间,过滤则是包括前两个优点,日志修改减少文件量省时间等等都是可行简单的,尽管性能不是简单线性,过大集群有明显的增长下降,负载不均衡也是必然不可避免,但是是资源重新整合利用确实有显著价值,且可通过算法降低,说明可塑性强,有提升空间,后期研究者可继续多方面优化,并提供借鉴,其数据的形式是同矩阵为基础构建,
Followed by the Google File System File System with the Big table there are many similarities, but also has the difference, such as interface is the API interface functions, but provide unique snapshot and record the additional cost and difference, moreover is a tree structure with similar structure framework, but different is that each has a link between two, and is a two-way relationship, the master server for each division of GFS fixed identity and distribution of the Chunk Big table have obvious difference, the File size is different also, reduce the master, to increase the Chunk approach, reducing the access to the master to the Chunk of the intermediate links to reduce time to improve efficiency, and to avoid excessive impact on the System, the master node is kill two birds with one stone, in addition there is the same as the Big table for the relationship to choose way, put forward metadata to establish corresponding relations between the two, make sure that the server stability data is consistent, build consistency model to determine region. The definition or not right or wrong judgment and modified, to complex and cumbersome, in addition to the master node operation, fault tolerance and diagnosis, and so on are its unique features, to component failure is the norm, additional ways to write and read optimization is unique, reduce data error, reduce the workload,
Graphs model map, reduce two parts, it is based on the GFS, use of distributed computing, split the large amounts of data to multiple low performance computer, after processing the results of the summary output, due to participate in the operation of computers are low performance, some master will appear problem, cannot work normally, at this time all the worker will vote for a new master, in addition to the worker every certain time sends a signal to the master if the master is not received signal, the master will assign tasks other worker, when its recovery is assigned a new task, make full use of idle resources, save time,
The above is my rough look at papers and feelings of doubt, design is unique, I thinking of similar things in a different perspective to explore better experience will have different results, also let me know clear logical rigorous theoretical and actual experimental combine to make the perfect theory in the real close to the ideal of the perfect,
  • Related