Home > other >  How to bin scala collection into subsets based upon bin range values
How to bin scala collection into subsets based upon bin range values

Time:11-19

I have a very large collection of case classes each with a String attribute and Double attribute like:

case class Sample(id:String, value: Double)

val samples: List[Sample] = List(
  Sample("a", 0), 
  Sample("b", 2), 
  Sample("c", 20), 
  Sample("d", 50), 
  Sample("e", 100), 
  Sample("f", 1000)
)

Given a list of buckets such as:

val buckets = List(5, 50, 100)

what would be the best way to result in a list of subsets like:

List(
  List(Sample("a", 0)), // samples with Value of 0
  List(Sample("b", 2)),   // Samples with Value > 0 & <= 5
  List(Sample("c", 20), Sample("d", 50)), // Samples with Value > 5 & <= 50
  List(Sample("e", 100)), // Samples with Value > 50 & <= 100
  List(Sample("f", 1000)), // Samples with Value > 100
)

CodePudding user response:

Add 0 explicitly as a bucket boundary, use binary search for quickly finding the right bucket in O(log(numBuckets)), use groupBy:

val buckets = List[Double](0, 5, 50, 100)

val indexFinder: Double => Int = {
  val arr = buckets.toArray
  (value: Double) => arr.search(value).insertionPoint
}

samples.groupBy(sample => indexFinder(sample.value)).values.toList.foreach(println)

gives:


List(Sample(a,0.0))
List(Sample(b,2.0))
List(Sample(c,20.0), Sample(d,50.0))
List(Sample(e,100.0))
List(Sample(f,1000.0))

Full code:

case class Sample(id:String, value: Double)

val samples: List[Sample] = List(
  Sample("a", 0), 
  Sample("b", 2), 
  Sample("c", 20), 
  Sample("d", 50), 
  Sample("e", 100), 
  Sample("f", 1000)
)

val buckets = List[Double](0, 5, 50, 100)

val indexFinder: Double => Int = {
  val arr = buckets.toArray
  (value: Double) => arr.search(value).insertionPoint
}

samples.groupBy(sample => indexFinder(sample.value)).values.toList.foreach(println)


  • Related