I have sample data in table which contains distance_travelled_in_meter
, where the values are of Integer type as follows:
distance_travelled_in_meter |
--------------------------- |
500 |
1221|
990 |
575|
I want to create range based on the value of the column distance_travelled_in_meter
. Range column has values with 500
intervals.
The result dataset is as follows:
distance_travelled_in_meter | range
--------------------------- |---------
500 | 1-500
1221|1000-1500
990 |500-1000
575|500-1000
For value 500
, the range is 1-500
as it is within 500 meter, 1221
is in 1000-1500
and so on..
I tried using Spark.sql.functions.sequence
but it takes the start and stop column values which is not what I want and want to be in range that I mentioned above. And also it creates an Range array from start column value to stop column value.
I'm using Spark2.4.2 with Scala 2.11.12 Any help is much appreciated.
CodePudding user response:
You can chain multiple when
expressions that you generate dynamically using something like this:
val maxDistance = 1221 // you can get this from the dataframe
val ranges = (0 until maxDistance by 500).map(x => (x, x 500))
val rangeExpr = ranges.foldLeft(lit(null)) {
case (acc, (lowerBound, upperBound)) =>
when(
col("distance_travelled_in_meter").between(lowerBound, upperBound),
lit(s"$lowerBound-$upperBound")
).otherwise(acc)
}
val df1 = df.withColumn("range", rangeExpr)