Home > Software engineering >  spark sql encoder for immutable data type
spark sql encoder for immutable data type

Time:10-23

I've generally used immutable value types when writing java code. Sometimes it's been through libraries (Immutables, AutoValue, Lombok), but mostly just vanilla java classes with:

  • all final fields
  • a constructor with all fields as parameters

(This question is for java 11 and below, given current spark support).

In Spark Sql, data types require an Encoder. Using off-the-shelf encoders like Encoder.bean(MyType.class), using such an immutable data type results in "illegal reflective access operation".

I'm curious what the spark sql (dataset) approach is here. Obviously I could relax this and make it a mutable pojo.


Update

Looking into the code for Encoders.bean it really does have to be a classic, mutable POJO. The reflection code looks for appropriate setters. Further (and this is documented) the only supported collection types are array, list and map (not set).

CodePudding user response:

This was actually a misdiagnosis. The immutability of my data type was not causing the reflective access issues. It was a JVM 11 issue (mostly noted here) https://github.com/renaissance-benchmarks/renaissance/issues/241

By adding the following JVM arguments everything is working correctly:

--illegal-access=deny --add-opens java.base/java.nio=ALL-UNNAMED --add-opens java.base/sun.nio.ch=ALL-UNNAMED

  • Related