Home > Mobile >  Round values of a varying quantity of columns on Databricks Scala
Round values of a varying quantity of columns on Databricks Scala

Time:02-23

I am using Scala on Databricks and:

  • I have a dataframe that has N columns.
  • All but the first Y columns are of the type "float" and have numbers that I want to round to 0 decimals.

I don't want to write to each column that needs to be rounded one specific line of code, because there may be a lot of columns that will need to be rounded and they vary.

In order to do that, I tried to create a function with Map (not sure if it is the best option):

def roundValues(precision: Int)(df: DataFrame): DataFrame = {

val roundedCols = df.columns.map(c => round(col(c), precision).as(c))

df.select(roundedCols: _*)

}

df.transform(roundValues(0))

But I always get an error because the first Y columns are strings, dates, or other types.

My questions:

  1. How can I round the values on all of the necessary columns?
  2. The number of Y columns in the beginning may vary, as well as the number of N-Y columns that I need to round. Is there a way for me not to have to manually insert the name of the columns that will need to be rounded? (ex.: round only the columns of the type float, ignore all other)
  3. In the end, should I convert from float to other type? I am going to use the final dataframe to do some plots or some simple calculations. I won't need decimals anymore for these things.

CodePudding user response:

You can get datatype information from dataframe schema:

import org.apache.spark.sql.types.FloatType

val floatColumns = df.schema.fields.filter(_.dataType == FloatType).map(_.name)

val selectExpr = df.columns.map(c =>
  if (floatColumns.contains(c)) 
     round(col(c), 0).as(c) 
  else col(c)
)

val df1 = df.select(selectExpr: _*)
  • Related