Home > Software design >  How to write a function that takes a list of column names of a DataFrame, reorders selected columns
How to write a function that takes a list of column names of a DataFrame, reorders selected columns

Time:02-18

I'd like to build a function

def reorderColumns(columnNames: List[String]) = ...

that can be applied to a Spark DataFrame such that the columns specified in columnNames gets reordered to the left, and remaining columns (in any order) remain to the right.

Example: Given a df with the following 5 columns

| A | B | C | D | E

df.reorderColumns(["D","B","A"]) returns a df with columns ordered like so:

| D | B | A | C | E

CodePudding user response:

Try this one:

def reorderColumns(df: DataFrame, columns: Array[String]): DataFrame = {
  val restColumns: Array[String] = df.columns.filterNot(c => columns.contains(c))
  df.select((columns    restColumns).map(col): _*)
}

Usage example:

val spark: SparkSession = SparkSession.builder().appName("test").master("local[*]").getOrCreate()

import spark.implicits._
val df = List((1, 3, 1, 6), (2, 4, 2, 5), (3, 6, 3, 4)).toDF("colA", "colB", "colC", "colD")
reorderColumns(df, Array("colC", "colB")).show

// output:
// ---- ---- ---- ---- 
//|colC|colB|colA|colD|
// ---- ---- ---- ---- 
//|   1|   3|   1|   6|
//|   2|   4|   2|   5|
//|   3|   6|   3|   4|
// ---- ---- ---- ---- 
  • Related