This is a very simple example
$Test = @('ae','æ')
$Test | Select-Object -Unique
The output
ae
What is going on here and how can I avoid it. Obviously I do not want "ae" to be equal to "æ"
CodePudding user response:
As mentioned in the comments, your current culture settings identify ae
and æ
as equal, so it's only returning the first one in the input array.
If you reverse the order you'll get æ
instead:
$Test = @('æ','ae')
$Test | Select-Object -Unique
# æ
You can check which culture PowerShell is using with this:
PS> Get-Culture
LCID Name DisplayName
---- ---- -----------
2057 en-GB English (United Kingdom)
Rather than a culture-aware comparison, it sounds like what you're after is an "ordinal" comparison - for more details see Ordinal String Operations:
Ordinal comparisons are string comparisons in which each byte of each string is compared without linguistic interpretation; for example, "windows" does not match "Windows".
(And by extension, ae
, does not equal æ
)
I can't find an idiomatic way to do that in PowerShell (you can change culture with Set-Culture
, but all the ones I tried still treat ae
equal to æ
), but if you want more control over how values are compared, you could drop down into Linq like this:
PS> $data = @( "ae", "æ" )
PS> [System.Linq.Enumerable]::Distinct([string[]]$data, [System.StringComparer]::Ordinal )
ae
æ
You've then got a whole bunch of different way to compare strings:
https://docs.microsoft.com/en-us/dotnet/api/system.stringcomparer?view=net-6.0#properties
CurrentCulture - Gets a StringComparer object that performs a case-sensitive string comparison using the word comparison rules of the current culture.
CurrentCultureIgnoreCase - Gets a StringComparer object that performs case-insensitive string comparisons using the word comparison rules of the current culture.
InvariantCulture - Gets a StringComparer object that performs a case-sensitive string comparison using the word comparison rules of the invariant culture.
InvariantCultureIgnoreCase - Gets a StringComparer object that performs a case-insensitive string comparison using the word comparison rules of the invariant culture.
Ordinal - Gets a StringComparer object that performs a case-sensitive ordinal string comparison.
OrdinalIgnoreCase - Gets a StringComparer object that performs a case-insensitive ordinal string comparison.
and you can even implement your own:
class FirstLetterComparer : System.Collections.Generic.IEqualityComparer[string] {
[bool]Equals([string]$x, [string]$y) { return $x[0] -eq $y[0]; }
[int]GetHashCode([string] $x) { return $x[0].GetHashCode(); }
}
# returns the first item in the list that starts with each distinct character.
# note that "abb" is omitted because it starts with the same first letter as "aaa"
# so it's not "first letter distinct".
$data = @( "aaa", "abb", "bbb" )
[System.Linq.Enumerable]::Distinct([string[]]$data, [FirstLetterComparer]::new() )
# aaa
# bbb