Home > Enterprise >  As RDDs are immutable - what will be the use case for emptyRDD
As RDDs are immutable - what will be the use case for emptyRDD

Time:02-24

rdd = sparkContext.emptyRDD() 

What is the need of this method. What we can do with this empty RDD.

Can anyone give some use case or idea. Where we can use this empty rdd?

As RDDs are immutable - what will be the use case for emptyRDD

CodePudding user response:

Few cases:

  • If your method must return a RDD (and not a null value) even in case where nothing matches, then an emptyRDD is accurate,

  • If you want to do a loop, to union from 0 to n RDD into a single one : the 0's one will be the empty rdd, then rdd = rdd.union(anotherOne) for each next loop.

CodePudding user response:

Honestly I have never used it, but I guess it is there because some transformations need an RDD as argument, whether it is empty or not. Suppose you need to perform an outer join and the RDD you are joining against depends on a condition that could determine its emptyness, like:

full_rdd.fullOuterJoin(another_full_rdd if condition else sparkContext.emptyRDD())

If the condition is not satisfied, the result shows pairs of type (key, (full_rdd[key], None). I think it is the more elegant way to perform a full join based on a condition, But, as I said, I have never needed something like that, I hope someone else finds better examples.

  • Related