If I have an application that runs the same job on the same set of columns (not necessarily same row values) every day. Is there a way I can save the spark execution plan without having spark recompute it every time?
My application requires thousands of transformations and there is significant time involved in building the lineage graph and optimization plan.
CodePudding user response:
Is there a way I can save the spark execution plan without having spark recompute it every time?
I have never came across such possibility, so with large dose of confidence I can say that it's not an option.
What instead you can do it to optimize the data that is the input to Spark - optimal partitioning, compression, a format that supports predicate pushdown is probably the places where you can look for some time savings.