Home > Software design >  Powershell -Unique æ converted to ae
Powershell -Unique æ converted to ae

Time:06-16

This is a very simple example

$Test = @('ae','æ')
$Test | Select-Object -Unique

The output

ae

What is going on here and how can I avoid it. Obviously I do not want "ae" to be equal to "æ"

CodePudding user response:

As mentioned in the comments, your current culture settings identify ae and æ as equal, so it's only returning the first one in the input array.

If you reverse the order you'll get æ instead:

$Test = @('æ','ae')
$Test | Select-Object -Unique
# æ

You can check which culture PowerShell is using with this:

PS> Get-Culture

LCID             Name             DisplayName
----             ----             -----------
2057             en-GB            English (United Kingdom)

Rather than a culture-aware comparison, it sounds like what you're after is an "ordinal" comparison - for more details see Ordinal String Operations:

Ordinal comparisons are string comparisons in which each byte of each string is compared without linguistic interpretation; for example, "windows" does not match "Windows".

(And by extension, ae, does not equal æ)

I can't find an idiomatic way to do that in PowerShell (you can change culture with Set-Culture, but all the ones I tried still treat ae equal to æ), but if you want more control over how values are compared, you could drop down into Linq like this:

PS> $data = @( "ae", "æ" )
PS> [System.Linq.Enumerable]::Distinct([string[]]$data, [System.StringComparer]::Ordinal )
ae
æ

You've then got a whole bunch of different way to compare strings:

https://docs.microsoft.com/en-us/dotnet/api/system.stringcomparer?view=net-6.0#properties

  • CurrentCulture - Gets a StringComparer object that performs a case-sensitive string comparison using the word comparison rules of the current culture.

  • CurrentCultureIgnoreCase - Gets a StringComparer object that performs case-insensitive string comparisons using the word comparison rules of the current culture.

  • InvariantCulture - Gets a StringComparer object that performs a case-sensitive string comparison using the word comparison rules of the invariant culture.

  • InvariantCultureIgnoreCase - Gets a StringComparer object that performs a case-insensitive string comparison using the word comparison rules of the invariant culture.

  • Ordinal - Gets a StringComparer object that performs a case-sensitive ordinal string comparison.

  • OrdinalIgnoreCase - Gets a StringComparer object that performs a case-insensitive ordinal string comparison.

and you can even implement your own:

class FirstLetterComparer : System.Collections.Generic.IEqualityComparer[string] {
  [bool]Equals([string]$x, [string]$y) { return $x[0] -eq $y[0]; }
  [int]GetHashCode([string] $x) { return $x[0].GetHashCode(); }
}

# returns the first item in the list that starts with each distinct character.
# note that "abb" is omitted because it starts with the same first letter as "aaa"
# so it's not "first letter distinct".
$data = @( "aaa", "abb", "bbb" )
[System.Linq.Enumerable]::Distinct([string[]]$data, [FirstLetterComparer]::new() )
# aaa
# bbb
  • Related