Home > Enterprise >  Data structure to represent mapping of intervals
Data structure to represent mapping of intervals

Time:04-26

Here is a function which can deduce the status of a person given their age

def getStatus(age: Int): String = {
  age match {
    case age if 0 until 2 contains age => "infant"
    case age if 2 until 10 contains age => "child"
    case age if 10 until 18 contains age => "teen"
    case _ => "adult"
  }
}

Let's say the boundaries can change. We can decide a person can be considered as an infant until 3 years old. As they change we do not want the boundaries to be hard coded and they will be stored externally.

What is a data-structure that can store mappings based on an interval?

Something like

val intervalMap = IntervalMap(
  (0, 2) -> "infant",
  (2, 10) -> "child",
  (10, 18) -> "teen",
  (18, 200 ) -> "adult"
)
intervalMap(1) // "infant"
intervalMap(12) // "teen"

I'm developping in Scala but a language agnostic answer will be much appreciated.

CodePudding user response:

If I wanted to do such thing, I would've first created age state enum representing "infant", "child", etc. This doesn't change anything, just feels better when coding with:

object AgeStatus extends Enumeration {
  type AgeStatus = Value
  val infant, child, teen, adult = Value // just an enumeration of age state
} // in a separate file

Then I would define a trait which can represent an age interval which when applied to an age, can determine if the age belongs to this interval or not, and if it belongs, we need a method to know the age state of this interval.

sealed trait Interval {
  def apply(age: Int): Boolean
  def state: AgeState
}

Now I'll define a case class representing an age interval, it has 2 fields for the actual age interval, from and to, second one is optional since adult range does not have an upper bound.

case class AgeInterval(from: Int, toOpt: Option[Int], ageState: AgeStatus) extends Interval {
  override def apply(age: Int): Boolean = 
    age >= from && toOpt.map(_ > age).getOrElse(true)

  override def state = this.ageState

}

Now lets define the companion object for the trait, this helps us organize age intervals inside the companion object, and also define some methods in order to get the state of an age easily. I'll create the private objects based on the intervals that we already know (note that this part of the answer only belongs to modeling, it doesn't deal with the issue of hard coding, I'll explain that one latter).

Object Interval {
  private final val infantInterval = AgeInterval(0, Some(2), AgeStatus.infant)
  private final val childInterval = AgeInterval(2, Some(10), AgeStatus.child)
  private final val teenInterval = AgeInterval(10, Some(18), AgeStatus.teen)
  private final val adultInterval = AgeInterval(18, None, AgeStatus.adult)
  private final val allIntervals: Seq[Interval] = Seq(/* all of the above */)

  def getStatus(age: Int): Option[AgeState] = allIntervals.collectFirst {
    case interval if interval(age) => interval.state
  } // age can be -1, we don't want to cause exceptions here, 
    // so it returns an optional instance
    
}

So now I can just use Interval.getState(10) and I'll get Some(teen). The problem still exists with hard-coding, but the solution is pretty simple, instead of creating final val's, you can use a method or something similar (which basically just reads the interval bounds from some data source or something).

CodePudding user response:

Easy Answer

There isn't anything in the Scala standard library that does that, but if the number of "categories" is low like in your example, there's no harm in implementing a naive O(N) apply method on your IntervalMap class.

def apply(in: Int) = categories.collectFirst { 
  case ((min, max), value) if in >= min && in < max => value 
}

Guava

Looks like the Guava library has a RangeMap class that seems to fit your use-case.

Fancier DIY Idea

To get a O(log N) lookup characteristic, you could represent your category data as a binary tree:

  • Each node defines a min and max
  • Root node represents the absolute minimum to the absolute maximum, e.g. Int.MinValue to Int.MaxValue
  • Leaf nodes define a value (e.g. "child")
  • Non-leaf nodes define a split value, where the left child's max will be equal to the split, and the right child's min will be equal to the split
  • Find values in the tree by traversing left/right depending on whether your input number (e.g. age) is greater than or less than the current node's split

You'd have to deal with balancing the tree as it gets built... and TBH this is probably how Guava's doing it under the hood (I did not look into the implementation)

  • Related