Home > Software design >  Does there exist a designated (sub)index delimiter?
Does there exist a designated (sub)index delimiter?

Time:06-28

Background

It is quite common in PowerShell to build a hash table to quickly access objects by a specific property, e.g. to base an index on the LastName:

$List =  ConvertFrom-Csv @'
Id, LastName, FirstName, Country
 1, Aerts,    Ronald,    Belgium
 2, Berg,     Ashly,     Germany
 3, Cook,     James,     England
 4, Duval,    Frank,     France
 5, Lyberg,   Ash,       England
 6, Fischer,  Adam,      Germany
'@

$Index = @{}
$List |ForEach-Object { $Index[$_.LastName] = $_ }

$Index.Cook

Id LastName FirstName Country
-- -------- --------- -------
3  Cook     James     England

In some cases it is required to build the index on two (or even more) properties, e.g. the FirstName and the LastName. For this you might create a multi dimensional key, e.g.:

$Index = @{}
$List |ForEach-Object {
     $Index[$_.FirstName] = @{}
     $Index[$_.FirstName][$_.LastName] = $_
}

$Index.James.Cook

Id LastName FirstName Country
-- -------- --------- -------
3  Cook     James     England

But it is easier (and possibly even faster) to just concatenate the two properties. If only for checking for the existence of the entry: $Index.ContainsKey('James').ContainsKey('Cook') where an error might occur if the FirstName doesn't exist.
To join the properties, it is required to use a delimiter between the property otherwise different property lists might end up as the same key. As this example: AshlyBerg and AshLyberg.

$Index = @{}
$List |ForEach-Object { $Index["$($_.FirstName)`t$($_.LastName)"] = $_ }

$Index."James`tCook"

Id LastName FirstName Country
-- -------- --------- -------
3  Cook     James     England

Note: the above are Minimal, Reproducible Examples. In real life, I come several times to the questions below, which includes generally joining objects where the background - and number of properties used in the index are variable.

Questions:

  1. Is it a good practice to join (concatenate) properties for such a situation?
  2. If yes, is there a (standard?) delimiter for this? (meaning a character -or a sequence of characters- that should never be used/exist in a property name)

CodePudding user response:

Instead of joining the keys I suggest to use a "split key" by the help of the Tuple class. In this case there is no need for a delimiter, as the keys are not joined but stored as separate properties in an object. The Tuple class provides the necessary interfaces so the tuple acts like a single key when used in any Dictionary (e. g. Hashtable).

$List =  ConvertFrom-Csv @'
Id, LastName, FirstName, Country
 1, Aerts,    Ronald,    Belgium
 2, Berg,     Ashly,     Germany
 3, Cook,     James,     England
 4, Duval,    Frank,     France
 5, Lyberg,   Ash,       England
 6, Fischer,  Adam,      Germany
'@

$Index = @{}
$List.ForEach{ $Index[ [Tuple]::Create( $_.LastName, $_.FirstName ) ] = $_ }

$Index

When written to the console, the split key gets nicely formatted:

Name                           Value
----                           -----
(Berg, Ashly)                  @{Id=2; LastName=Berg; FirstName=Ashly; Country=Germany}
(Lyberg, Ash)                  @{Id=5; LastName=Lyberg; FirstName=Ash; Country=England}
(Duval, Frank)                 @{Id=4; LastName=Duval; FirstName=Frank; Country=France}
(Aerts, Ronald)                @{Id=1; LastName=Aerts; FirstName=Ronald; Country=Belgium}
(Cook, James)                  @{Id=3; LastName=Cook; FirstName=James; Country=England}
(Fischer, Adam)                @{Id=6; LastName=Fischer; FirstName=Adam; Country=Germany}

To look up an entry, create a temporary tuple:

$Index[ [Tuple]::Create('Duval','Frank') ]

An advantage of the Tuple class is that you can easily get the individual keys that make up the split key, without having to split a string:

# Using member access enumeration
$Index.Keys.Item1  # Prints all last names
$Index.Keys.Item2  # Prints all first names

# Using the enumerator to loop over the index
$Index.GetEnumerator().ForEach{ $_.Key.Item1 }

The .NET Framework 4.7 adds the ValueTuple struct (what's the difference?). It might be worth testing whether it gives better performance for this use case. Also, replacing Hashtable by a generic Dictionary could improve performance as well:

$Index = [Collections.Generic.Dictionary[ ValueTuple[String,String], object]]::new()

Apart from construction of the dictionary, ValueTuple can be used like Tuple. Simply replace Tuple by ValueTuple in the previous code samples.

  • Related