Home > other >  The major difference between the Spark and graphs
The major difference between the Spark and graphs

Time:09-22

Spark in the core is the concept of RDD (elastic distributed data sets), in recent years, with the increasing amount of data, distributed cluster parallel computing (such as graphs, a Dryad, etc) is widely used in processing the increasing data, most of these design excellent calculation model has advantages of good fault tolerance, scalability, load balance, the advantages of simple programming method, so that they are favored by many enterprises, by the majority of users for large-scale data processing,
Graphs, however, most of these parallel computation is based on the data flow model of circulation, that is, a data process contains the data read from the Shared file system, calculation, complete the calculation, write the results to the Shared storage, in the process of calculation, the different between computing nodes keep highly parallel, the data flow model for those who need to repeated use of a particular data set more efficient iterative algorithm of the first run,
Spark and Spark RDD is used in order to solve this problem and developed, the Spark using a special design of data structure, called RDD, is an important feature of RDD, distributed data sets can be reused in different parallel environments, this feature will Spark and other parallel data streams (such as graphs) the difference between a model framework,
  • Related