I am writing a PowerShell script to work in Windows 10. I am using the 'HTML Agility Pack' library version 1.11.43.
In this library, there is a GetAttributeValue
method for HTML element nodes in four versions:
public string GetAttributeValue(string name, string def)
public int GetAttributeValue(string name, int def)
public bool GetAttributeValue(string name, bool def)
public T GetAttributeValue<T>(string name, T def)
I have written a test script for these methods on PowerShell:
$libPath = "HtmlAgilityPack.1.11.43\lib\netstandard2.0\HtmlAgilityPack.dll"
Add-Type -Path $libPath
$dom = New-Object -TypeName "HtmlAgilityPack.HtmlDocument"
$dom.Load("test.html", [System.Text.Encoding]::UTF8)
foreach ($node in $dom.DocumentNode.DescendantNodes()) {
if ("#text" -ne $node.Name) {
$node.OuterHTML
" " $node.GetAttributeValue("class", "")
" " $node.GetAttributeValue("class", 0)
" " $node.GetAttributeValue("class", $true)
" " $node.GetAttributeValue("class", $false)
" " $node.GetAttributeValue("class", $null)
}
}
File 'test.html':
<p ></p>
<p ></p>
<p></p>
<p ></p>
Test script execution result:
<p ></p>
true
0
True
True
True
<p ></p>
false
0
False
False
False
<p></p>
0
True
False
False
<p ></p>
any other text
0
True
False
False
I know that to get the attribute value of an HTML element, you can also use the expression $node.Attributes["class"]
. I also understand what polymorphism and method overloading are. I also know what a generic method is. I don't need to explain that.
I have three questions:
When called
$node.GetAttributeValue("class", $null)
from a PowerShell script, which of the four variants of theGetAttributeValue
method works?I think the fourth option works (generic method). Then why does a call with the second parameter
$null
work exactly the same as a call with the second parameter$false
?In the C# source code, the fourth option requires the following condition to work
#if !(METRO || NETSTANDARD1_3 || NETSTANDARD1_6)
I tried the library versions for NETSTANDARD1_6
and for NETSTANDARD2_0
. The test script works the same way. But with NETSTANDARD1_6
the fourth option should be unavailable, right? Then when NETSTANDARD1_6
then which version of the method GetAttributeValue
works with the second parameter $null
?
CodePudding user response:
tl;dr
To achieve what you unsuccessfully attempted with $node.GetAttributeValue("class", $null)
, i.e., to return the attribute value as a [string]
and default to $null
if there is none, use:
$node.GetAttributeValue("class", [string] [NullString]::Value)
[string] $null
works too, but makes ""
(the empty string) rather than $null
the default value.
While the overload resolution that you're seeing is surprising, you can resolve ambiguity during PowerShell's method overload resolution with casts:
$dom = [HtmlAgilityPack.HtmlDocument]::new()
$dom.LoadHtml(@'
<p ></p>
<p class=42></p>
<p></p>
<p ></p>
'@)
$nodes = $dom.DocumentNode.SelectNodes('p')
# Note the use of explicit casts (e.g., [string]) to guide overload resolution.
$nodes[0].GetAttributeValue('class', [bool] $false)
$nodes[1].GetAttributeValue('class', [int] 0)
$nodes[2].GetAttributeValue('class', [string] 'default')
$nodes[3].GetAttributeValue('class', [string] [NullString]::Value)
Output:
True
42
default
any other text
Alternatively, in PowerShell (Core) 7.3 [1], you can now call generic methods with explicit type arguments:
# PS 7.3
# Note the generic type argument directly after the method name.
# Calls the one and only generic overload, with various types substituted for T:
# public T GetAttributeValue<T>(string name, T def)
# Note how the 2nd argument doesn't need a cast anymore.
$nodes[0].GetAttributeValue[bool]('class', $false)
$nodes[1].GetAttributeValue[int]('class', 0)
$nodes[2].GetAttributeValue[string]('class', 'default')
$nodes[3].GetAttributeValue[string]('class', [NullString]::Value)
Note:
When you pass
$null
to a[string]
typed parameter (both in cmdlets and .NET methods), PowerShell actually converts it quietly to""
(the empty string).[NullString]::Value
tell's PowerShell to pass a truenull
instead, and is mostly needed for calling .NET methods where a behavioral distinction can result from passingnull
vs.""
.Therefore, if you were to call
$nodes[3].GetAttributeValue('class', [string] $null)
or, in PS 7.3 ,$nodes[3].GetAttributeValue[string]('class', $null)
, you'd get""
(empty string) as the default value if attributeclass
doesn't exist.By contrast,
[NullString]::Value
, as used in the commands above, causes a true$null
value to be returned if the attribute doesn't exist; you can test for that with$null -eq ...
.
As for your questions:
On a general note, PowerShell's overload resolution is complex, and for the ultimate source of truth you'll have to consult the source code. The following is based on the de-facto behavior as of PowerShell 7.2.6 and musings about logic that could be applied.
When calling
$node.GetAttributeValue("class", $null)
from a PowerShell script, which of the four variants of the GetAttributeValue method works?
In practice, the public bool GetAttributeValue(string name, bool def)
overload is chosen; why it, specifically, is chosen among the available overloads is ultimately immaterial, because the fundamental problem is that to PowerShell, $null
provides insufficient information as to the type it may be a stand-in for, so it cannot generally be expected to select a specific overload (for the latter, you need a cast, as shown at the top):
In C# passing
null
to the second parameter in a non-generic call unambiguously implies the overload with thestring
-typeddef
parameter, because among the non-generic overloads,string
as the type of thedef
parameter is the only .NET reference type, and therefore the only type that can directly accept anull
argument.This is not true in PowerShell, which has much more flexible, implicit type-conversion rules: from PowerShell's perspective,
$null
can bind to any of the types among thedef
parameters, because it allows$null
to be converted to those types; specifically,[bool] $null
yields$false
,[int] $null
yields0
, and - perhaps surprisingly, as discussed above -[string] $null
yields""
(the empty string).- Thus, PowerShell is justified in selecting any one of the non-generic overloads in this case, and which one it chooses should be considered an implementation detail.
However, curiously, even using [NullString]::Value
doesn't make a difference, even though PowerShell should know that this special value represents a $null
value for a string
parameter - see GitHub issue #18072
I think the fourth option works (generic method). Then why does a call with the second parameter $null work exactly the same as a call with the second parameter $false?
With the generic invocation syntax available in v7.3 , the generic overload definitely works - and a $null
as the default-value argument is converted to the type specified as the type argument (assuming PowerShell allows such a conversion; it wouldn't work with [datetime]
, for instance, because [datetime] $null
causes an error).
Even with the non-generic syntax, PowerShell does select the generic overload by inference, as the following example shows, but only when you pass an actual object rather than $null
:
# Try to retrieve a non-existent attribute and provide a [double]
# default value.
# The fact that a [double] instance is returned implies that the
# generic overload was chosen.
# -> 'System.Double'
$nodes[0].GetAttributeValue('nosuch', [double] $null).GetType().FullName
In the C# source code, the fourth option requires the following condition to work [...]
When you pass $null
, the generic overload is not considered - and cannot be, in the absence of type information - so this doesn't make a difference.
[1] As of this writing, v7.3 hasn't been released yet, but preview versions are available - see the repo.