Which version of GetAttributeValue of the 'HTML Agility Pack' is used when calling from Po-CodePudding

I am writing a PowerShell script to work in Windows 10. I am using the 'HTML Agility Pack' library version 1.11.43.

In this library, there is a GetAttributeValue method for HTML element nodes in four versions:

public string GetAttributeValue(string name, string def)
public int GetAttributeValue(string name, int def)
public bool GetAttributeValue(string name, bool def)
public T GetAttributeValue<T>(string name, T def)

I have written a test script for these methods on PowerShell:

$libPath = "HtmlAgilityPack.1.11.43\lib\netstandard2.0\HtmlAgilityPack.dll"
Add-Type -Path $libPath
$dom = New-Object -TypeName "HtmlAgilityPack.HtmlDocument"
$dom.Load("test.html", [System.Text.Encoding]::UTF8)

foreach ($node in $dom.DocumentNode.DescendantNodes()) {
    if ("#text" -ne $node.Name) {
        $node.OuterHTML
        "    "   $node.GetAttributeValue("class", "")
        "    "   $node.GetAttributeValue("class", 0)
        "    "   $node.GetAttributeValue("class", $true)
        "    "   $node.GetAttributeValue("class", $false)
        "    "   $node.GetAttributeValue("class", $null)
    }
}

File 'test.html':

<p ></p>
<p ></p>
<p></p>
<p ></p>

Test script execution result:

<p ></p>
    true
    0
    True
    True
    True
<p ></p>
    false
    0
    False
    False
    False
<p></p>

    0
    True
    False
    False
<p ></p>
    any other text
    0
    True
    False
    False

I know that to get the attribute value of an HTML element, you can also use the expression $node.Attributes["class"]. I also understand what polymorphism and method overloading are. I also know what a generic method is. I don't need to explain that.

I have three questions:

When called $node.GetAttributeValue("class", $null) from a PowerShell script, which of the four variants of the GetAttributeValue method works?
I think the fourth option works (generic method). Then why does a call with the second parameter $null work exactly the same as a call with the second parameter $false?
In the C# source code, the fourth option requires the following condition to work

#if !(METRO || NETSTANDARD1_3 || NETSTANDARD1_6)

I tried the library versions for NETSTANDARD1_6 and for NETSTANDARD2_0. The test script works the same way. But with NETSTANDARD1_6 the fourth option should be unavailable, right? Then when NETSTANDARD1_6 then which version of the method GetAttributeValue works with the second parameter $null?

CodePudding user response：

tl;dr

To achieve what you unsuccessfully attempted with
$node.GetAttributeValue("class", $null), i.e., to return the attribute value as a [string] and default to $null if there is none, use:

$node.GetAttributeValue("class", [string] [NullString]::Value)

^{[string] $null works too, but makes "" (the empty string) rather than $null the default value.}

While the overload resolution that you're seeing is surprising, you can resolve ambiguity during PowerShell's method overload resolution with casts:

$dom = [HtmlAgilityPack.HtmlDocument]::new()
$dom.LoadHtml(@'
<p ></p>
<p class=42></p>
<p></p>
<p ></p>
'@)

$nodes = $dom.DocumentNode.SelectNodes('p')

# Note the use of explicit casts (e.g., [string]) to guide overload resolution.
$nodes[0].GetAttributeValue('class', [bool] $false)
$nodes[1].GetAttributeValue('class', [int] 0)
$nodes[2].GetAttributeValue('class', [string] 'default')
$nodes[3].GetAttributeValue('class', [string] [NullString]::Value)

Output:

True
42
default
any other text

Alternatively, in PowerShell (Core) 7.3 ^[1], you can now call generic methods with explicit type arguments:

# PS 7.3 
# Note the generic type argument directly after the method  name.
# Calls the one and only generic overload, with various types substituted for T:
#   public T GetAttributeValue<T>(string name, T def)
# Note how the 2nd argument doesn't need a cast anymore.
$nodes[0].GetAttributeValue[bool]('class',  $false)
$nodes[1].GetAttributeValue[int]('class', 0)
$nodes[2].GetAttributeValue[string]('class', 'default')
$nodes[3].GetAttributeValue[string]('class', [NullString]::Value)

Note:

When you pass $null to a [string] typed parameter (both in cmdlets and .NET methods), PowerShell actually converts it quietly to "" (the empty string). [NullString]::Value tell's PowerShell to pass a true null instead, and is mostly needed for calling .NET methods where a behavioral distinction can result from passing null vs. "".
Therefore, if you were to call $nodes[3].GetAttributeValue('class', [string] $null) or, in PS 7.3 , $nodes[3].GetAttributeValue[string]('class', $null), you'd get "" (empty string) as the default value if attribute class doesn't exist.
By contrast, [NullString]::Value, as used in the commands above, causes a true $null value to be returned if the attribute doesn't exist; you can test for that with $null -eq ....

As for your questions:

On a general note, PowerShell's overload resolution is complex, and for the ultimate source of truth you'll have to consult the source code. The following is based on the de-facto behavior as of PowerShell 7.2.6 and musings about logic that could be applied.

When calling $node.GetAttributeValue("class", $null) from a PowerShell script, which of the four variants of the GetAttributeValue method works?

In practice, the public bool GetAttributeValue(string name, bool def) overload is chosen; why it, specifically, is chosen among the available overloads is ultimately immaterial, because the fundamental problem is that to PowerShell, $null provides insufficient information as to the type it may be a stand-in for, so it cannot generally be expected to select a specific overload (for the latter, you need a cast, as shown at the top):

In C# passing null to the second parameter in a non-generic call unambiguously implies the overload with the string-typed def parameter, because among the non-generic overloads, string as the type of the def parameter is the only .NET reference type, and therefore the only type that can directly accept a null argument.
This is not true in PowerShell, which has much more flexible, implicit type-conversion rules: from PowerShell's perspective, $null can bind to any of the types among the def parameters, because it allows $null to be converted to those types; specifically, [bool] $null yields $false, [int] $null yields 0, and - perhaps surprisingly, as discussed above - [string] $null yields "" (the empty string).
- Thus, PowerShell is justified in selecting any one of the non-generic overloads in this case, and which one it chooses should be considered an implementation detail.

However, curiously, even using [NullString]::Value doesn't make a difference, even though PowerShell should know that this special value represents a $null value for a string parameter - see GitHub issue #18072

I think the fourth option works (generic method). Then why does a call with the second parameter $null work exactly the same as a call with the second parameter $false?

With the generic invocation syntax available in v7.3 , the generic overload definitely works - and a $null as the default-value argument is converted to the type specified as the type argument (assuming PowerShell allows such a conversion; it wouldn't work with [datetime], for instance, because [datetime] $null causes an error).

Even with the non-generic syntax, PowerShell does select the generic overload by inference, as the following example shows, but only when you pass an actual object rather than $null:

# Try to retrieve a non-existent attribute and provide a [double]
# default value.
# The fact that a [double] instance is returned implies that the
# generic overload was chosen.
#  -> 'System.Double'
$nodes[0].GetAttributeValue('nosuch', [double] $null).GetType().FullName

In the C# source code, the fourth option requires the following condition to work [...]

When you pass $null, the generic overload is not considered - and cannot be, in the absence of type information - so this doesn't make a difference.

^{[1] As of this writing, v7.3 hasn't been released yet, but preview versions are available - see the repo.}