I'd like to build a function
def reorderColumns(columnNames: List[String]) = ...
that can be applied to a Spark DataFrame such that the columns specified in columnNames
gets reordered to the left, and remaining columns (in any order) remain to the right.
Example: Given a df with the following 5 columns
| A | B | C | D | E
df.reorderColumns(["D","B","A"]) returns a df with columns ordered like so:
| D | B | A | C | E
CodePudding user response:
Try this one:
def reorderColumns(df: DataFrame, columns: Array[String]): DataFrame = {
val restColumns: Array[String] = df.columns.filterNot(c => columns.contains(c))
df.select((columns restColumns).map(col): _*)
}
Usage example:
val spark: SparkSession = SparkSession.builder().appName("test").master("local[*]").getOrCreate()
import spark.implicits._
val df = List((1, 3, 1, 6), (2, 4, 2, 5), (3, 6, 3, 4)).toDF("colA", "colB", "colC", "colD")
reorderColumns(df, Array("colC", "colB")).show
// output:
// ---- ---- ---- ----
//|colC|colB|colA|colD|
// ---- ---- ---- ----
//| 1| 3| 1| 6|
//| 2| 4| 2| 5|
//| 3| 6| 3| 4|
// ---- ---- ---- ----