Apply a transformation to all the columns with the same data type on Spark-CodePudding

I need to apply a transformation to all the Integer columns of my Data Frame before writting a CSV. The transformation consists on changing the type to String and then transform the format to the European one (E.g. 1234567 -> "1234567" -> "1.234.567").

Has Spark any way to apply this transformation to all the Integer Columns? I want it to be a generic functionality (because I need to write multiple CSVs) instead of hardcoding all the columns to transform for each dataframe.

CodePudding user response：

DataFrame has dtypes method, which returns column names along with their data types: Array[("Column name", "Data Type")].

You can map this array, applying different expressions to each column, based on their data type. And you can then pass this mapped list to the select method:

import spark.implicits._
import org.apache.spark.sql.functions._

val dataSeq = Seq(
    (1246984,  993922, "test_1"),
    (246984,  993922, "test_2"),
    (246984,  993922, "test_3"))

val df = dataSeq.toDF("int_1", "int_2", "str_3")
df.show
 ------- ------ ------ 
|  int_1| int_2| str_3|
 ------- ------ ------ 
|1246984|993922|test_1|
| 246984|993922|test_2|
| 246984|993922|test_3|
 ------- ------ ------ 

val columns = 
  df.dtypes.map{
    case (c, "IntegerType") => regexp_replace(format_number(col(c), 0), ",", ".").as(c)
    case (c, t) => col(c)
  }

val df2 = df.select(columns:_*)
df2.show
 --------- ------- ------ 
|    int_1|  int_2| str_3|
 --------- ------- ------ 
|1,246,984|993,922|test_1|
|  246,984|993,922|test_2|
|  246,984|993,922|test_3|
 --------- ------- ------