Just learning spark, and I wonder if I during spark script I should clean dataframes after I'm executing a code that runs the DF?
e.g,
# Do something on friends DF...
friendsByAge = lines.select("age", "friends")
friendsByAge.groupBy("age").avg("friends").show()
# now do something unrelated to friends DF
In the case above, is friendsByAge
DF is kept in memory during the entire driver script execution (even after I don't need it anymore) and if it does, should I clean it somehow, or once I show
it it's removed from memory?
CodePudding user response:
DataFrame is being loaded lazily, so it's only loaded when you run the action show
. Also, it won't be cached automatically (only if you explicitly cache
or persist
it), so you don't need to worry about cleaning it. If you do cache a DataFrame called df
, you can remove it from cache using:
df.unpersist()