I want to convert a Spark dataframe to a dataset of a POJO with different fields names. I have a dataframe of the fields: name
, date_of_birth
, where their types are IntegerType
, DateType
.
And a POJO of:
public class Person implements Serializable {
private Integer name;
private Date dateOfBirth;
}
I convert it to dataset successfully with the following code:
Encoder<Person> personEncoder = Encoders.bean(Person.class);
Dataset<Person> personDS = result.as(personEncoder);
List<Person> personList = personDS.collectAsList();
Only if I change the dataframe’s columns names before that, to those of the Person POJO. Is there any way of telling Spark to map between the fields from the POJO side?
I thought about Gson’s @SerializedName(“date_of_birth”)
but it didn’t affect anything.
CodePudding user response:
If you have a name mapping, say in a Map, you could use it to rename the columns before converting the dataframe into a dataset.
It could be written like this:
// I create the map, but it could be read from a config file for instance
Map<String, String> nameMapping = new java.util.HashMap<>();
nameMapping.put("id", "name");
nameMapping.put("date", "dateOfBirth");
Column[] renamedColumns = nameMapping
.entrySet()
.stream()
.map(x -> col(x.getKey()).alias(x.getValue()))
.collect(Collectors.toList())
.toArray(new Column[0]);
result.select(renamedColumns).as(personEncoder)
CodePudding user response:
I am not aware of specific annotations. However, here is how I'd solve it.
I would create a specific dataframe with the shape I want, then export it.
It would look like:
Dataset<Row> exportDf = df
.withColumn("dateOfBirth",
col("date_of_birth").cast(DataTypes.StringType))
.drop("date_of_birth");
The full example I wrote can be found here: https://github.com/jgperrin/net.jgp.labs.spark/tree/master/src/main/java/net/jgp/labs/spark/l999_scrapbook/l002.
Notes:
- I am assuming that
result
in your code is aDataset<Row>
. - I used String for your date as Spark was a little touchy about converting a Date to a String in a POJO. If you need help specifically on this issue, create another SO question, I'll happily look at it.