I am receiving a NullPointerException which I believe is due to the way objects are initialised but cannot find any supporting documentation.
I have this example code which illustrates the problem in Scala 2.12.7, I have found repeatable results in Scala 3.1.3 also:
abstract class Item(val collectionName: String)
abstract class ItemCollection(val name: String)
object TechItems extends ItemCollection("tech") {
// referencing 'name' from 'ItemCollection' superclass
case object TV extends Item(collectionName = name)
val items: Map[String, Item] = Map("tv" -> TV)
}
object Test1 extends App {
// prints 'tech'
println(TechItems.items.get("tv").map(_.collectionName))
}
object Test2 extends App {
// prints 'tech'
println(TechItems.TV.collectionName)
// throws NullPointerException
println(TechItems.items.get("tv").map(_.collectionName))
}
When running Test1
, the code behaves as you'd expect. When running Test2
, we now receive a NullPointerException
when accessing the map after accessing the TV
object directly.
When I no longer reference a field from the superclass, the issue no longer occurs:
...
object TechItems extends ItemCollection("tech") {
// using String instead of reference to superclass field
case object TV extends Item(collectionName = "mycollection")
val items: Map[String, Item] = Map("tv" -> TV)
}
...
object Test2 extends App {
// prints 'mycollection'
println(TechItems.TV.collectionName)
// prints 'Some(mycollection)'
println(TechItems.items.get("tv").map(_.collectionName))
}
My current understanding of how TechItems
is initialised:
- We access
TechItems.TV.collectionName
which begins initialisingTechItems
- An
ItemCollection("tech")
is created whose fields are then available inside ofTechItems
(depending on access modifiers of said superclass fields) TV
is initialised and references the superclass fieldname
items
is initialised and referencesTV
as a value for key"tv"
I am sure that understanding is wrong but that is what I am here to learn.
My current theory for the NullPointerException:
- We access
TechItems.TV.collectionName
which begins initialisingTechItems
items
is initialised alongsideTV
, butitems
captures an uninitialisedTV
asnull
- Our access to
TechItems.TV.collectionName
returns the value of"tech"
TechItems.items.get("tv")
returnsSome(null)
becauseTV
at the point of initialisingitems
wasnull
, due to not being initialised.NullPointerException
is thrown
To me it feels like a somewhat farfetched theory. I am sure my lack of understanding is shown here and there is an explanation in some documentation that I have failed to find. Why do I get this NullPointerException
? What is the initialisation order? And why does removing the reference to a superclass field affect this initialisation?
CodePudding user response:
Wow, this is a good one! Here is what I think is going on ...
Consider this "pseudo-java" code, that I believe more-or-less accurately reflects what is actually happening in the JVM:
class TechItems extends ItemCollection {
static MODULE = new TechItems("tech")
static class TV extends Item {
static MODULE = new TV(TechItems.MODULE.name)
}
val items = Map("tv" -> TV.MODULE)
}
So, now, when you do print(TechItems.TV.MODULE.collectionName)
,
TechItems.MODULE
gets constructed, because we need to pull name
out of it to create TV
.
This constructor, runs to the Map("tv" -> TV.MODULE)
line, and puts null
into the map (TV.MODULE
is still null - we are only figuring out what to pass to its constructor.
If you use "mycollection"
instead of name
, it becomes
static MODULE = new TV("mycollection")
, which doesn't trigger TechItems
constructor.
What happens when you don't access TV
before looking at items
? Well, in that case, TechItems.MODULE
gets initialized first, so, by the time you get to the new TV
thing, as part of constructing the items
, TechItems.MODULE.name
is already available, so TV.MODULE
can be created and put into the map.
CodePudding user response:
Very instructive example indeed and Dima is absolutely right! In fact, without inspecting the decompiled code, it would be harder to figure out what is happening under the hood. For simplicity, let's assume you just do these 2 calls in order (it will reproduce the issue):
println(TechItems.TV) // prints 'TV'
println(TechItems.items) // prints 'Map(tv -> null)'
Now let's decompile the code and show only the relevant parts. (I removed unnecessary code to be easier to follow) First these calls:
Predef$.MODULE$.println((Object)Main.TechItems$.TV$.MODULE$);
Predef$.MODULE$.println((Object)Main.TechItems$.MODULE$.items());
This was our Main
. Now TechItems
and TV
:
public static class TechItems$ extends ItemCollection {
public static final TechItems$ MODULE$;
private static final Map<String, Main.Item> items;
static {
MODULE$ = new TechItems$();
items = (Map)Predef$.MODULE$.Map().apply((Seq)ScalaRunTime$.MODULE$.wrapRefArray(
(Object[])new Tuple2[] {
Predef.ArrowAssoc$.MODULE$.$minus$greater$extension(
Predef$.MODULE$.ArrowAssoc((Object)"tv"), (Object)TV$.MODULE$)
}));
}
public Map<String, Main.Item> items() {
return TechItems$.items;
}
public TechItems$() {
super("tech");
}
public static class TV$ extends Main.Item implements Product, Serializable {
public static final TV$ MODULE$;
static {
Product.$init$((Product)(MODULE$ = new TV$()));
}
public TV$() {
super(TechItems$.MODULE$.name());
}
}
When calling our first println
statement we trigger the evaluation of TechItems.TV
which translates to TechItems$.TV$.MODULE$
. The MODULE$
is just a static final reference of TV
that gets initialized in the static block of TV
. To get initialized, it starts executing the static block, which in turn calls TV
's constructor, new TV$()
which in turn triggers the call to TechItems
via: super(TechItems$.MODULE$.name());
This is the part where it gets interesting: TechItems$.MODULE$
is just the static final reference of TechItems
, that was not yet referenced, so it was not yet initialized. Again, in the same manner, to get initialized, the static block of TechItems
gets called. But this time the static block is different: It has to initialize TechItems$.MODULE$
and items
as well, because both reside in the same static block.
Since we are in the middle of initializing TV$.MODULE$
, and we just called items
which requires the same reference - that we have not yet finished initializing, this reference is null
at this point in time, so items
is executed having TV$.MODULE$
as null
.
After this, the static block of TechItems$.MODULE$
finishes, the static block of TechItems.TV
finishes and we get printed TV
at the console. The second print
becomes self-explanatory. The call to items()
returns TechItems$.items
that we just evaluated in the previous call to TV
, so items
return Map(tv -> null)
which gets printed.
Observations:
Using
case object TV extends Item(collectionName = name)
is precisely what triggers the issue. The logical idea is that, you do not want to evaluateitems
beforeTV
finishes evaluation. So one can do 2 things: 1 - either not callTV
before first callingitems
or justTechItems
- which will trigger the evaluation ofTV
, and thus the correct initialization ofitems
- or 2 (better solution) - delay evaluation ofitems
as much as possible, until you really needed.Naturally - the solution to the second point is to make
items
alazy val
. If we do this, the issue goes away, becauseitems
will no longer be evaluated unless explicitly referenced by us, and it will no longer trigger evaluation when calling justTV
. And if we callitems
first, it will triggerTV
's evaluation first. I can't show you the difference in the decompiled code because only the ScalaSignature differs: keywords likelazy
are implemented as "pickled" signature bytes since these are easily picked up by the JVM through reflection.Changing it to
case object TV extends Item(collectionName = "mycollection")
is also a fix. Since you no longer callsuper(TechItems$.MODULE$.name());
fromTV
at all,items
's evaluation is no longer triggered when justTV
is called. The call toTV
's constructor becomessuper("mycollection")
, so the secondprint
would then correctly evaluateitems
toMap(tv -> TV)
. This is why thenull
goes away when you change it.This is an example of a circular dependency:
TV
"kind of" needsitems
anditems
needsTV
- and the order of initialization really makes the difference between a working code and a code that throws nulls at unexpected times. SinceTV
is presumably initializedlazy
, makingitems
lazy
as well should theoretically remove the circular dependency. Anobject
definition in Scala behaves much like alazy val
with an annonymous class, that gets initialized on demand, the first time it is used.So the first instinct when you see an
object
inside anotherobject
, is to assume the formerobject
will be lazily initialized (unless explicitly referenced). Becauseitems
does referenceTV
explicitly, even if you don't callTV
explicitly,TV
will be evaluated either when referencing justTechItems
or directlyitems
, whichever comes first, because both are in the same static context, as we saw.