I have a very large collection of case classes each with a String attribute and Double attribute like:
case class Sample(id:String, value: Double)
val samples: List[Sample] = List(
Sample("a", 0),
Sample("b", 2),
Sample("c", 20),
Sample("d", 50),
Sample("e", 100),
Sample("f", 1000)
)
Given a list of buckets such as:
val buckets = List(5, 50, 100)
what would be the best way to result in a list of subsets like:
List(
List(Sample("a", 0)), // samples with Value of 0
List(Sample("b", 2)), // Samples with Value > 0 & <= 5
List(Sample("c", 20), Sample("d", 50)), // Samples with Value > 5 & <= 50
List(Sample("e", 100)), // Samples with Value > 50 & <= 100
List(Sample("f", 1000)), // Samples with Value > 100
)
CodePudding user response:
Add 0
explicitly as a bucket boundary, use binary search for quickly finding the right bucket in O(log(numBuckets))
, use groupBy
:
val buckets = List[Double](0, 5, 50, 100)
val indexFinder: Double => Int = {
val arr = buckets.toArray
(value: Double) => arr.search(value).insertionPoint
}
samples.groupBy(sample => indexFinder(sample.value)).values.toList.foreach(println)
gives:
List(Sample(a,0.0))
List(Sample(b,2.0))
List(Sample(c,20.0), Sample(d,50.0))
List(Sample(e,100.0))
List(Sample(f,1000.0))
Full code:
case class Sample(id:String, value: Double)
val samples: List[Sample] = List(
Sample("a", 0),
Sample("b", 2),
Sample("c", 20),
Sample("d", 50),
Sample("e", 100),
Sample("f", 1000)
)
val buckets = List[Double](0, 5, 50, 100)
val indexFinder: Double => Int = {
val arr = buckets.toArray
(value: Double) => arr.search(value).insertionPoint
}
samples.groupBy(sample => indexFinder(sample.value)).values.toList.foreach(println)