Background
It is quite common in PowerShell to build a hash table to quickly access objects by a specific property, e.g. to base an index on the LastName
:
$List = ConvertFrom-Csv @'
Id, LastName, FirstName, Country
1, Aerts, Ronald, Belgium
2, Berg, Ashly, Germany
3, Cook, James, England
4, Duval, Frank, France
5, Lyberg, Ash, England
6, Fischer, Adam, Germany
'@
$Index = @{}
$List |ForEach-Object { $Index[$_.LastName] = $_ }
$Index.Cook
Id LastName FirstName Country
-- -------- --------- -------
3 Cook James England
In some cases it is required to build the index on two (or even more) properties, e.g. the FirstName
and the LastName
. For this you might create a multi dimensional key, e.g.:
$Index = @{}
$List |ForEach-Object {
$Index[$_.FirstName] = @{}
$Index[$_.FirstName][$_.LastName] = $_
}
$Index.James.Cook
Id LastName FirstName Country
-- -------- --------- -------
3 Cook James England
But it is easier (and possibly even faster) to just concatenate the two properties. If only for checking for the existence of the entry: $Index.ContainsKey('James').ContainsKey('Cook')
where an error might occur if the FirstName
doesn't exist.
To join the properties, it is required to use a delimiter between the property otherwise different property lists might end up as the same key. As this example: AshlyBerg
and AshLyberg
.
$Index = @{}
$List |ForEach-Object { $Index["$($_.FirstName)`t$($_.LastName)"] = $_ }
$Index."James`tCook"
Id LastName FirstName Country
-- -------- --------- -------
3 Cook James England
Note: the above are Minimal, Reproducible Examples. In real life, I come several times to the questions below, which includes generally joining objects where the background - and number of properties used in the index are variable.
Questions:
- Is it a good practice to join (concatenate) properties for such a situation?
- If yes, is there a (standard?) delimiter for this? (meaning a character -or a sequence of characters- that should never be used/exist in a property name)
CodePudding user response:
Instead of joining the keys I suggest to use a "split key" by the help of the Tuple
class. In this case there is no need for a delimiter, as the keys are not joined but stored as separate properties in an object. The Tuple
class provides the necessary interfaces so the tuple acts like a single key when used in any Dictionary
(e. g. Hashtable
).
$List = ConvertFrom-Csv @'
Id, LastName, FirstName, Country
1, Aerts, Ronald, Belgium
2, Berg, Ashly, Germany
3, Cook, James, England
4, Duval, Frank, France
5, Lyberg, Ash, England
6, Fischer, Adam, Germany
'@
$Index = @{}
$List.ForEach{ $Index[ [Tuple]::Create( $_.LastName, $_.FirstName ) ] = $_ }
$Index
When written to the console, the split key gets nicely formatted:
Name Value
---- -----
(Berg, Ashly) @{Id=2; LastName=Berg; FirstName=Ashly; Country=Germany}
(Lyberg, Ash) @{Id=5; LastName=Lyberg; FirstName=Ash; Country=England}
(Duval, Frank) @{Id=4; LastName=Duval; FirstName=Frank; Country=France}
(Aerts, Ronald) @{Id=1; LastName=Aerts; FirstName=Ronald; Country=Belgium}
(Cook, James) @{Id=3; LastName=Cook; FirstName=James; Country=England}
(Fischer, Adam) @{Id=6; LastName=Fischer; FirstName=Adam; Country=Germany}
To look up an entry, create a temporary tuple:
$Index[ [Tuple]::Create('Duval','Frank') ]
An advantage of the Tuple
class is that you can easily get the individual keys that make up the split key, without having to split a string:
# Using member access enumeration
$Index.Keys.Item1 # Prints all last names
$Index.Keys.Item2 # Prints all first names
# Using the enumerator to loop over the index
$Index.GetEnumerator().ForEach{ $_.Key.Item1 }
The .NET Framework 4.7 adds the ValueTuple
struct (what's the difference?). It might be worth testing whether it gives better performance for this use case. Also, replacing Hashtable
by a generic Dictionary
could improve performance as well:
$Index = [Collections.Generic.Dictionary[ ValueTuple[String,String], object]]::new()
Apart from construction of the dictionary, ValueTuple
can be used like Tuple
. Simply replace Tuple
by ValueTuple
in the previous code samples.