Home > OS >  Do Spark encoders respect Java's rules of inheritance?
Do Spark encoders respect Java's rules of inheritance?

Time:05-21

My understanding: If I have a model class that extends a second model class, I shouldn't be able to access the private members of the parent class in the child class (unless I use reflection).

Extending this, I expect that when a Spark dataframe is encoded as a dataset of the child model class, it shouldn't have columns that include private members of the parent model class. (But this is not what I observe.)

More concretely, my parent class:

public class Foo {
    private int one;
    protected String two;
    protected double three;
}

The child class:

public class Bar extends Foo {
    private int four;
    protected String five;
}

I have a couple of Bar objects that I use to create a Spark dataframe i.e., Dataset<Row> like so:

Dataset<Row> barDF = session.createDataframe(barList, Bar.class);

When, at a later point, I want to encode this as a dataset,

Dataset<Bar> barDS = barDF.as(Encoders.bean(Bar.class));

I expect barDS to have four columns (excluding one, the private member of Foo). But the result of barDS.show() is instead:

 ------ ------ ----- ------- ----- 
| five | four | one | three | two |
 ------ ------ ----- ------- ----- 
| 9    | 9    | 0   | 3.0   | 3   |
| 16   | 16   | 0   | 4.0   | 4   |
 ------ ------ ----- ------- ----- 

What am I missing in expecting one not to be present in the dataset? Also, what encoding can I use instead of bean encoding so that Java's rules of inheritance are obeyed?

CodePudding user response:

As @Ivo Beckers comment helped clarify, Spark must be using reflection. But it looks like it uses getters and setters to actually access the fields. So if I declare the getter/setter for one as protected, everything works as expected.

  • Related