Why does Java type inference fail to distinguish between Function and Consumer?-CodePudding

Given the following identity functions:

<T> Consumer<T> f(Consumer<T> c) { return c; }          // (1)
<T,R> Function<T,R> f(Function<T, R> c) { return c; }   // (2)

I observe the following behaviour in JDK 11 and JDK 17:

void _void() {}
f(x -> {});                   // okay, dispatches to (1)
f(x -> { return; });          // okay, dispatches to (1)
f(x -> { _void(); });         // okay, dispatches to (1)
f(x -> _void());              // should dispatch to (1)
|  Error:
|  reference to f is ambiguous
|    both method f(java.util.function.Function<java.lang.Object,java.lang.Object>) in  
     and method f(java.util.function.Consumer<java.lang.Object>) in  match

int _one() { return 1; }
f(x -> 1);                    // okay, dispatches to (2)
f(x -> { return 1; });        // okay, dispatches to (2)
f(x -> { return _one(); });   // okay, dispatches to (2)
f(x -> _one());               // should dispatch to (2)
|  Error:
|  reference to f is ambiguous
|    both method <T,R>f(java.util.function.Function<T,R>) in
     and method <T>f(java.util.function.Consumer<T>) in  match

Why can't the compiler resolve these symbols by using the return type of the expression? The curly brace versions work fine, and I would have thought they would be the more difficult cases. I understand that you can explicity cast the lambda function, but that defeats the purpose of what I am trying to achieve.

CodePudding user response：

x -> _void() and x -> one() are expected to be compatible with Consumer<T> (with the result of one() to be discarded).

When the lambda body is of a block type, the compiler additionally checks the "return" compatibility. The JLS is rather explicit about void/value compatibility for block bodies:

A block lambda body is void-compatible if every return statement in the block has the form return;. A block lambda body is value-compatible if it cannot complete normally (§14.21) and every return statement in the block has the form return Expression;.

While that doesn't say why the single-expression bodies fail, it says exactly why block bodies compile: the compiler looks at the return forms to judge on those bodies' compatibility with Consumer or Function (in this case).

For the method invocation expressions, the fact that this is allowed:

Consumer<Integer> c = x -> one(); //discarded result
Function<T, Integer> f = x -> one(); //returned result

doesn't enable the compiler to resolve the conflict that you observed. You can rewrite the same lambda expression with block bodies to resolve the conflict, and that's simply because block bodies are checked differently, by spec.

I guess I'm trying to say that the more natural question is "why block bodies compile at all in this case", given that we normally don't expect return types (forms?) to participate in overload resolution. But lambda expressions' congruence with types is something else, isn't it... I think this (that block type helps target type inference) is the special behavior.

CodePudding user response：

let's go through the overload resolution steps in the spec to see where exactly this fails :)

First, let's determine the potentially applicable methods. For both x -> _void() and x -> _one(), both overloads are potentially applicable. This is because both lambda expressions are congruent to the function types of both Function<T, R> and Consumer<T>. The important condition is:

If the lambda parameters are assumed to have the same types as the function type's parameter types, then:

If the function type's result is void, the lambda body is either a statement expression (§14.8) or a void-compatible block.

If the function type's result is a (non-void) type R, then either i) the lambda body is an expression that is compatible with R in an assignment context, or ii) the lambda body is a value-compatible block, and each result expression (§15.27.2) is compatible with R in an assignment context.

(Also notice that for the cases that compile, exactly one of the methods is potentially applicable.)

Then we try to resolve the method to invoke using strict invocation. Loose and variable arity invocation are not very relevant here, so if this phase fails, the whole thing fails. Notice that at the start of that section, the spec defines "pertinent to applicability", and both x -> _void() and x -> _one() are not pertinent to applicability. This will be important.

We then reach:

If m is a generic method and the method invocation does not provide explicit type arguments, then the applicability of the method is inferred as specified in §18.5.1.

According to §18.5.1, to determine the applicability of a method wrt to a call, you first add inference bounds according to the arguments and type parameters. Then you reduce and incorporate the bounds. If there are no false bounds (which are produced when you have conflicting bounds) in the result, then the method is applicable. The relevant point here is that arguments that are not pertinent to applicability are not considered when adding those bounds:

To test for applicability by strict invocation:

If k ≠ n, or if there exists an i (1 ≤ i ≤ n) such that ei is pertinent to applicability (§15.12.2.2) and either i) ei is a standalone expression of a primitive type but Fi is a reference type, or ii) Fi is a primitive type but ei is not a standalone expression of a primitive type; then the method is not applicable and there is no need to proceed with inference.

Otherwise, C includes, for all i (1 ≤ i ≤ k) where ei is pertinent to applicability, ‹ei → Fi θ›.

So the only bounds that are added are those from the type parameters. They obviously are not going to disagree/conflict with each other and produce a false bound, since they are independent.

So again, both methods are applicable.

When there are more than one applicable method, we of course choose the most specific method. The process for doing this for generic methods is described here. It's quite long so I won't quote it here. In principle, it is similar to how §18.5.1 works - add some type bounds, if they agree with each other (no false), then one method is more specific than the other. In this case, however, the implicitly typed lambdas cause a false bound to be added :(

Now knowing this, you can basically make it work the way you want by using explicitly typed lambdas, which are pertinent to applicability.

f((Integer x) -> _one()); // (2)
f((Integer x) -> _void()); // (1)