How to create a struct column from a list of column names in Spark with Java?-CodePudding

I have a DataFrame with multiple columns, e.g.

root
 |-- playerName
 |-- country
 |-- bowlingAvg
 |-- bowlingSR
 |-- wickets
 |-- battingAvg
 |-- battingSR
 |-- runs

I also have a list of the column names which corresponds to bowling stats:

List bowlingParams = new ArrayList(Arrays.asList("bowlingAvg", "bowlingSR", "wickets"));

Expected Schema:

root
 |-- playerName
 |-- country
 |-- bowlingAvg
 |-- bowlingSR
 |-- wickets
 |-- battingAvg
 |-- battingSR
 |-- runs
 |-- bowlingStats 
       |-- bowlingAvg
       |-- bowlingSR
       |-- wickets

I can do it like this

playerDF = playerDF.withColumn("bowlingStats", functions.struct("bowlingAvg", "bowlingSR", "wickets"))

However, I want to use the list to dynamically select the column for struct.

I know we can do it like this in Scala

playerDF = playerDF.select(struct(bowlingParams.map(col): _*))

and, I have also found a reference on how to do this in Python

Is there a way we can do this in Java with Spark?

CodePudding user response：

For java this solution worked for me,

remove the one attribute from list(non dynamic one)
convert the remaining list to Scala Sequence using JavaConverters.

when creating nested column , in struct use one attribute(as string) and your converted Scala Seq.

 import scala.collection.JavaConverters; 

 List bowlingParams = new ArrayList(Arrays.asList("bowlingSR", "wickets"));


playerDF = playerDF.withColumn("bowlingStats", functions.struct("bowlingAvg",JavaConverters.asScalaIteratorConverter(bowlingParams.iterator()).asScala().toSeq()));