Home > Net >  Concatenating Dataset columns in Apache Spark using Java by passing an array with column names as ar
Concatenating Dataset columns in Apache Spark using Java by passing an array with column names as ar

Time:06-17

Here is what I want to achieve using Java and Spark.

I have an array of column names as below.

String[] col_arr = new String[] { "colname_1", "colname_2"};

I want to concat the 2 columns by passing the array (with column names as array elememts) in the concat function.

Dataset<Row> new_abc = dataset_abc.withColumn("new_concat_Column", concat(col_arr));

The below code is working but I do not want to pass the column names explicitly, instead I want to pass the array contaning the column names as array elements.

Dataset<Row> new_abc = dataset_abc.withColumn("new_concat_Column", concat(col("colname_1"), col("colname_2")));

CodePudding user response:

You can pass an array of columns, Column[] to the concat function like so:

Column[] columnArray = {
    col("column1"), col("column2") 
    };
Dataset<Row> concatenatedDS = dataset.withColumn("concatenated_column", concat(columnArray));

If you only have a String[] array, you can build a Column[] array with it dynamically.

  • Related