NullPointerException when accessing case object in map-CodePudding

I am receiving a NullPointerException which I believe is due to the way objects are initialised but cannot find any supporting documentation.

I have this example code which illustrates the problem in Scala 2.12.7, I have found repeatable results in Scala 3.1.3 also:

abstract class Item(val collectionName: String)

abstract class ItemCollection(val name: String)

object TechItems extends ItemCollection("tech") {
  // referencing 'name' from 'ItemCollection' superclass
  case object TV extends Item(collectionName = name)

  val items: Map[String, Item] = Map("tv" -> TV)
}

object Test1 extends App {
  // prints 'tech'
  println(TechItems.items.get("tv").map(_.collectionName))
}

object Test2 extends App {
  // prints 'tech'
  println(TechItems.TV.collectionName)

  // throws NullPointerException
  println(TechItems.items.get("tv").map(_.collectionName))
}

When running Test1, the code behaves as you'd expect. When running Test2, we now receive a NullPointerException when accessing the map after accessing the TV object directly.

When I no longer reference a field from the superclass, the issue no longer occurs:

...

object TechItems extends ItemCollection("tech") {
  // using String instead of reference to superclass field
  case object TV extends Item(collectionName = "mycollection")

  val items: Map[String, Item] = Map("tv" -> TV)
}

...

object Test2 extends App {
  // prints 'mycollection'
  println(TechItems.TV.collectionName)

  // prints 'Some(mycollection)'
  println(TechItems.items.get("tv").map(_.collectionName))
}

My current understanding of how TechItems is initialised:

We access TechItems.TV.collectionName which begins initialising TechItems
An ItemCollection("tech") is created whose fields are then available inside of TechItems (depending on access modifiers of said superclass fields)
TV is initialised and references the superclass field name
items is initialised and references TV as a value for key "tv"

I am sure that understanding is wrong but that is what I am here to learn.

My current theory for the NullPointerException:

We access TechItems.TV.collectionName which begins initialising TechItems
items is initialised alongside TV, but items captures an uninitialised TV as null
Our access to TechItems.TV.collectionName returns the value of "tech"
TechItems.items.get("tv") returns Some(null) because TV at the point of initialising items was null, due to not being initialised.
NullPointerException is thrown

To me it feels like a somewhat farfetched theory. I am sure my lack of understanding is shown here and there is an explanation in some documentation that I have failed to find. Why do I get this NullPointerException? What is the initialisation order? And why does removing the reference to a superclass field affect this initialisation?

CodePudding user response：

Wow, this is a good one! Here is what I think is going on ...

Consider this "pseudo-java" code, that I believe more-or-less accurately reflects what is actually happening in the JVM:

class TechItems extends ItemCollection {
    static MODULE = new TechItems("tech")
    static class TV extends Item {
       static MODULE = new TV(TechItems.MODULE.name)
    }
    val items = Map("tv" -> TV.MODULE)
}

So, now, when you do print(TechItems.TV.MODULE.collectionName), TechItems.MODULE gets constructed, because we need to pull name out of it to create TV.

This constructor, runs to the Map("tv" -> TV.MODULE) line, and puts null into the map (TV.MODULE is still null - we are only figuring out what to pass to its constructor.

If you use "mycollection" instead of name, it becomes static MODULE = new TV("mycollection"), which doesn't trigger TechItems constructor.

What happens when you don't access TV before looking at items? Well, in that case, TechItems.MODULE gets initialized first, so, by the time you get to the new TV thing, as part of constructing the items, TechItems.MODULE.name is already available, so TV.MODULE can be created and put into the map.

CodePudding user response：

Very instructive example indeed and Dima is absolutely right! In fact, without inspecting the decompiled code, it would be harder to figure out what is happening under the hood. For simplicity, let's assume you just do these 2 calls in order (it will reproduce the issue):

println(TechItems.TV)        // prints 'TV'
println(TechItems.items)     // prints 'Map(tv -> null)'

Now let's decompile the code and show only the relevant parts. (I removed unnecessary code to be easier to follow) First these calls:

Predef$.MODULE$.println((Object)Main.TechItems$.TV$.MODULE$);
Predef$.MODULE$.println((Object)Main.TechItems$.MODULE$.items());

This was our Main. Now TechItems and TV:

public static class TechItems$ extends ItemCollection {
    public static final TechItems$ MODULE$;
    private static final Map<String, Main.Item> items;
        
    static {
        MODULE$ = new TechItems$();
        items = (Map)Predef$.MODULE$.Map().apply((Seq)ScalaRunTime$.MODULE$.wrapRefArray(
                 (Object[])new Tuple2[] { 
                     Predef.ArrowAssoc$.MODULE$.$minus$greater$extension(
                     Predef$.MODULE$.ArrowAssoc((Object)"tv"), (Object)TV$.MODULE$) 
                 }));
    }

    public Map<String, Main.Item> items() {
        return TechItems$.items;
    }
    
    public TechItems$() {
        super("tech");
    }

    public static class TV$ extends Main.Item implements Product, Serializable {
       public static final TV$ MODULE$;
            
       static {
           Product.$init$((Product)(MODULE$ = new TV$()));
       }

       public TV$() {
           super(TechItems$.MODULE$.name());
       }
}

When calling our first println statement we trigger the evaluation of TechItems.TV which translates to TechItems$.TV$.MODULE$. The MODULE$ is just a static final reference of TV that gets initialized in the static block of TV. To get initialized, it starts executing the static block, which in turn calls TV's constructor, new TV$() which in turn triggers the call to TechItems via: super(TechItems$.MODULE$.name());

This is the part where it gets interesting: TechItems$.MODULE$ is just the static final reference of TechItems, that was not yet referenced, so it was not yet initialized. Again, in the same manner, to get initialized, the static block of TechItems gets called. But this time the static block is different: It has to initialize TechItems$.MODULE$ and items as well, because both reside in the same static block.

Since we are in the middle of initializing TV$.MODULE$, and we just called items which requires the same reference - that we have not yet finished initializing, this reference is null at this point in time, so items is executed having TV$.MODULE$ as null.

After this, the static block of TechItems$.MODULE$ finishes, the static block of TechItems.TV finishes and we get printed TV at the console. The second print becomes self-explanatory. The call to items() returns TechItems$.items that we just evaluated in the previous call to TV, so items return Map(tv -> null) which gets printed.

Observations:

Using case object TV extends Item(collectionName = name) is precisely what triggers the issue. The logical idea is that, you do not want to evaluate items before TV finishes evaluation. So one can do 2 things: 1 - either not call TV before first calling items or just TechItems - which will trigger the evaluation of TV, and thus the correct initialization of items - or 2 (better solution) - delay evaluation of items as much as possible, until you really needed.

Naturally - the solution to the second point is to make items a lazy val. If we do this, the issue goes away, because items will no longer be evaluated unless explicitly referenced by us, and it will no longer trigger evaluation when calling just TV. And if we call items first, it will trigger TV's evaluation first. I can't show you the difference in the decompiled code because only the ScalaSignature differs: keywords like lazy are implemented as "pickled" signature bytes since these are easily picked up by the JVM through reflection.
Changing it to case object TV extends Item(collectionName = "mycollection") is also a fix. Since you no longer call super(TechItems$.MODULE$.name()); from TV at all, items's evaluation is no longer triggered when just TV is called. The call to TV's constructor becomes super("mycollection"), so the second print would then correctly evaluate items to Map(tv -> TV). This is why the null goes away when you change it.
This is an example of a circular dependency: TV "kind of" needs items and items needs TV - and the order of initialization really makes the difference between a working code and a code that throws nulls at unexpected times. Since TV is presumably initialized lazy, making items lazy as well should theoretically remove the circular dependency. An object definition in Scala behaves much like a lazy val with an annonymous class, that gets initialized on demand, the first time it is used.

So the first instinct when you see an object inside another object, is to assume the former object will be lazily initialized (unless explicitly referenced). Because items does reference TV explicitly, even if you don't call TV explicitly, TV will be evaluated either when referencing just TechItems or directly items, whichever comes first, because both are in the same static context, as we saw.