What does it mean for a type bound to refer to itself in Scala?-CodePudding

I was looking through some code on GitHub and came across this type declaration:

abstract class TreeNode[BaseType <: TreeNode[BaseType]] extends Product with TreePatternBits {

(This is from the Spark source code, though this question is not really about Spark but Scala's type system.)

This is new to me. In plain English, what does this type bound mean?

CodePudding user response：

Hardly anyone reads white papers on System F (mentioned in the comments) and people are still using F-found type polymorphism intuitively.

All that you need to understand is that class/trait MyType[A <: MyType[A]] forces you to extends it as class Impl extends MyType[Impl], that is passing itseld as a type parameter.

trait Interface[A <: Interface[A]] {

  def getValue: String

  def setValue(value: String): A
}

Which is useful if you want to do something like:

class Impl(value: String) extends Interface[Impl] {

  override def getValue: String = value

  override def setValue(value: String): Impl = new Impl(value)
}

Without this A <: Interface[A] (type A is a subtype of/extends Impl[A]) someone could have implemented

// no F-bound type polymorphism
trait Interface[A] {

  def getValue: String

  def setValue(value: String): A
}

class Foo

class Impl extends Inteface[Foo] {

  override def getValue: String = value

  // we wanted to force implementation to have Impl here but now we cannot
  override def setValue(value: String): Foo = new Foo
}

It's useful if you have interfaces which e.g. model some algebras e.g.

trait Number[A <: Number[A]] {

  def add(a: A): A
}

to have the implementation the same input/output of methods as its own type:

class Integer(val i: Int) extends Number[Integer] {

  def add(a: Integer): Integer = new Integer(i   a.i)
}

It's more popular in OOP, as FP offers an alternative approach with the usage of type classes.

CodePudding user response：

One of the most widely used examples for motivating a construction like this, is to look at how Java's Cloneable interface is broken and how to fix it. (The same argument applies to Serializable as well.)

As you can see, the interface Cloneable is actually completely empty, it does not contain any method. And the method which does the actual cloning is Object.clone whose return type is just Object. That's not very useful and not very type-safe, since we always need to cast the result of clone back to the original type.

Let's assume we want to design an interface for an object that can be cloned.

trait GeorgeCloney:
  def klon: ???

The question is: what should the return type of the GeorgeCloney.klon method be? What is the type of the clone of an object?

Well, the type of a clone of an object is the type of the original object. But how do we express that?

In Type Theory, there is a concept called a MyType, which allows you to refer to the type of an object itself. Scala doesn't have MyTypes, but if it did, the interface could look something like this, where This is our hypothetical syntax for a MyType:

trait GeorgeCloney:
  def klon: This

Now, for example, if we had the following code:

class Foo extends GeorgeCloney
class Bar extends GeorgeCloney
class Qux extends Bar

The return type of Foo.klon would be Foo, the return type of Bar.klon would be Bar, and the return type of Qux.klon would be Qux.

Unfortunately, MyType is a very rare feature outside of a small number of academic research languages. The only production-quality, industrial-strength, mainstream programming languages I know of which have a MyType feature are TypeScript, Rust, and Raku (and you can see that I had to stretch the definition of "mainstream" quite a bit to even get more than one example). Python will have MyType by the end of 2022, which would be another truly "mainstream" language to have it.

Now, Scala actually does have this.type which at first glance might look like it fits the bill:

trait GeorgeCloney:
  def klon: this.type

However, it is actually too restrictive: foo.type is a singleton type, i.e. the type which only contains the object foo itself. Meaning that this.type is the type which only contains the object this and no other object.

Therefore, the only way to satisfy a return type constraint of this.type is to return this itself:

class Foo extends GeorgeCloney:
  def klon = this

However, what we actually want to do is to return a new instance of Foo with the same internal state as this:

class Foo extends GeorgeCloney:
  def klon = Foo()

Which is a type error.

Let's go back to the very beginning. In simplest terms, our interface looks like this:

trait GeorgeCloney:
  def klon: Any

Which forces us to always cast the returned clone to the desired type, which we don't want.

We can actually make the type slightly more precise, since we know that anything which can be cloned implements the GeorgeCloney interface:

trait GeorgeCloney:
  def klon: GeorgeCloney

But that is really not an improvement, since we still need to cast anyway, unless all we want to do with the clone is to clone it again. However, let's keep the fact that we can make the type more precise by using the knowledge that anything that is the result of a clone operation must be itself cloneable.

So, what can we do? We want to have a different return type for each cloneable type. So, we need to somehow "tell" the GeorgeCloney interface to have a different type for each cloneable type. Well, there is a mechanism for telling something about some information that may be different each time and may not be known at the same place: a parameter!

We can add a type parameter to the GeorgeCloney interface:

trait GeorgeCloney[A]:
  def klon: A

And then use it like this:

class Foo extends GeorgeCloney[Foo]:
  def klon = Foo()

Now, when I call the klon method, I get back an object of type Foo just like I expect.

However, it is pretty easy to circumvent this:

class Foo extends GeorgeCloney[String]:
  def klon = "Haha! Fooled you! I am not a clone!"

Oops. Now, when I clone a Foo, I actually get back a String, and this is perfectly type-safe.

But, remember that we said above that we actually know an extra bit of information about the type? We know that A must be GeorgeCloney itself, because a cloned object can only have come from a cloneable object, and since the cloned and the original object have the same type, we know that both the original and the clone must be cloneable. So, we know that the return value of klon is not just an A but a "cloneable A".

Well, we have already defined what the type of a cloneable A is, because that's the very type we have just written: GeorgeCloney[A]. So, we know that the A in GeorgeCloney[A] must itself be a GeorgeCloney[A]:

trait GeorgeCloney[A <: GeorgeCloney[A]]:
  def klon: A

If we try the same thing as above, we now get a type error:

class Foo extends GeorgeCloney[String]:
// Type argument String does not conform to upper bound GeorgeCloney[String]
  def klon = "Haha! Fooled you! I am not a clone!"

It is still not nearly as safe as we would like it to be, because I still can do something like this:

class EvilCloney extends GeorgeCloney[EvilCloney]:
  def klon = EvilCloney()

class InnocentLookingFakeCloney extends GeorgeCloney[EvilCloney]:
  def klon = EvilCloney()

When I try to clone InnocentLookingFakeCloney, I actually get an EvilCloney, not an InnocentLookingFakeCloney. However, it is a lot harder to do this by accident.

This does not protect against developers who deliberately circumvent the barriers put in place, but it does protect against honest mistakes.

Note that all of the above can be expressed in any type system that has parametric polymorphism (aka generics), bounded polymorphism (i.e. the ability to place constraints on type parameters), and allows the type parameter itself to appear in the constraint. You can do this in Java and C#, for example.

Scala has yet another trick up its sleeve, though: we can further restrict the type by using a self type annotation:

trait GeorgeCloney[A <: GeorgeCloney[A]]:
  this: A =>
  def klon: A

Now, we have not only constrained the type parameter to be cloneable itself, we have also constrained the type of this (i.e. the type of the class which implements this interface) to be the same as the type used for the type argument A. Therefore, the EvilCloney trick no longer works:

class EvilCloney extends GeorgeCloney[EvilCloney]:
  def klon = EvilCloney()

class InnocentLookingFakeCloney extends GeorgeCloney[EvilCloney]:
// illegal inheritance:
//   self type InnocentLookingFakeCloney
//   of class InnocentLookingFakeCloney 
//   does not conform to 
//   self type EvilCloney
//   of parent trait GeorgeCloney
  def klon = EvilCloney()

However, even this improved version still does not "do the right thing" when it comes to inheritance:

class EvilCloney extends GeorgeCloney[EvilCloney]:
  def klon = EvilCloney()

class InnocentLookingFakeCloney extends EvilCloney

InnocentLookingFakeCloney().klon
// has type `EvilCloney`, not `InnocentLookingFakeCloney`

This is not easily fixable while keeping this particular design, not even with Scala's advanced type system features, and certainly not in a language like Java or C#.

It can, however, be fixed by changing the design to use type classes instead of interfaces, which can be expressed in Scala (with some boilerplate).

If you are interested in some additional motivating examples, you might want to look at some of the following Stack Exchange answers of mine: