Home > database >  how to create range column based on a column value?
how to create range column based on a column value?

Time:02-24

I have sample data in table which contains distance_travelled_in_meter, where the values are of Integer type as follows:

distance_travelled_in_meter | 
--------------------------- |
                        500 | 
                        1221|
                        990 |
                         575|

I want to create range based on the value of the column distance_travelled_in_meter. Range column has values with 500 intervals. The result dataset is as follows:

distance_travelled_in_meter | range
--------------------------- |---------
                        500 | 1-500
                        1221|1000-1500
                        990 |500-1000
                         575|500-1000

For value 500, the range is 1-500 as it is within 500 meter, 1221 is in 1000-1500 and so on..

I tried using Spark.sql.functions.sequence but it takes the start and stop column values which is not what I want and want to be in range that I mentioned above. And also it creates an Range array from start column value to stop column value.

I'm using Spark2.4.2 with Scala 2.11.12 Any help is much appreciated.

CodePudding user response:

You can chain multiple when expressions that you generate dynamically using something like this:

val maxDistance = 1221 // you can get this from the dataframe

val ranges = (0 until maxDistance by 500).map(x => (x, x   500))

val rangeExpr = ranges.foldLeft(lit(null)) {
  case (acc, (lowerBound, upperBound)) =>
    when(
      col("distance_travelled_in_meter").between(lowerBound, upperBound),
      lit(s"$lowerBound-$upperBound")
    ).otherwise(acc)
}

val df1 = df.withColumn("range", rangeExpr)
  • Related