Home > other >  The spark cache problem
The spark cache problem

Time:10-11

Read data from the hive and then cache

Var data=https://bbs.csdn.net/topics/spark.sql (" the select * from a "). The cache

Data. The show for the first time

At the back of the calculation was carried out on a table in the modified

This time
. The data show the second
And before the same?


Because of who I am now doing this though data cache, but changed data source, data calculation, again the result of the first and second, the appearance of the cache is not successful, so that we can make the data remains the same, though the data source has changed,

CodePudding user response:

Great god help, a great god for help

CodePudding user response:

, data has been read out and cache into a spark of memory, the equivalent of playing a snapshot in spark (or copy)

CodePudding user response:

But upstairs and I are the same idea, but in the test found that if the original table changed after that things are different, felt the cache invalidation

CodePudding user response:

There is a problem, but also by the way, here sparkSQL cache and cache of RDD is different, because now, it seems, is like a sparkSQL cache memory and disk, and RDD is only a memory

CodePudding user response:

The same, regardless of whether there is a buffer, will not change, or RDD DataFrme is read-only, can only from one state into another state, he itself is not changed

CodePudding user response:

Is cough up, brother, I also found the same process, the spark of SQL, after the modification of the data table from the data source to generate RDD cache invalidation, recalculation, the pit is ah

CodePudding user response:

Principle of spark with a relational database are similar, they this kind of practice is a positive solution.
1, the spark is similar to SQL database SQL,
2, spark SQL cache is equivalent to a query cache, or call a query view, have every time there is no need to go to check the data after the operation.
3, when the database data change, the spark to refresh this change to the cache to query,
4, if the data cache is constant change, is calculated using the dirty data? The calculation results is not wrong?
  • Related