Home > Back-end >  Splitting strings based on regex expression
Splitting strings based on regex expression

Time:08-03

Is there a quick way I can split a string based on a regex expression match. Using Powershell

This string "800G1" should read "800 G1" but when the string is found it just ignores it because it is not found in an array of words that has "800" and "G1" in it. the pattern of the string will almost always be 3 digits a letter and 1 digit.

Is there a way I can match the string "800G1" as "800" and "G1" by splitting the string and making a match in my array.

Example Code:

##Incoming string from a loop
$str = "800G1"
##A pre loaded list of single words made from a list of PC models
$array = @("800"; "G1")

###The match has to determine if the incoming string can be used to build a model name.  
$match = ($array -eq $str) ##Need a regex expression to first split and find a match in the array

$match
##No Match
###I have tried this regex that may work but I cannot get it to match on the 3rd regex expression:  '^\d\d\d\G[0-9]$'
$reg =@('^G[0-9]$','^\d\d\d$', '^\d\d\d\G[0-9]$')
($tokens[1] | Select-String -pattern $reg -AllMatches).Matches.Value

$tokens[1] Returns G1
$tokens[0] Returns 800
But $str returns nothing. When it should return 800G1

CodePudding user response:

# If the string matches a certain pattern, split it in two.
[array] $tokens = 
 if ($str -match '^(\d{3})([a-z]\d)$') { $Matches.1, $Matches.2 }
 else                                  { $str }

# Test if all tokens exist as elements in the array.
# -> $true, in this case.
$allTokensContainedInArray = 
  (Compare-Object $array $tokens).SideIndicator -notcontains '=>'
  • The regex-based -match operator is used to test whether $str starts with 3 digits, followed by a letter and a single digit, and, if so, via capture groups ((...)) and the automatic $Matches variable, splits the string into the part with the 3 digits and the rest.

  • The above uses Compare-Object to test (case-insensitively) if the array elements derived from the input string are all contained in the reference array, in any order, while allowing the reference array to contain additional elements.


If you want to limit all input strings to those matching regex pattern, before even attempting lookup in the array:

# If no pattern matches, $tokens will be $null
[array] $tokens = 
  if     ($str -match '^(\d{3})([a-z]\d)$') { $Matches.1, $Matches.2 }
  elseif ($str -match '^\d{3}$')            { $str }
  elseif ($str -match '^[a-z]\d$')          { $str }

CodePudding user response:

With the help of code from @mklement0 I have come up with the following solution. This is now working as required.
I am sure that this code could probably be slimmed down though.

$token = $null
$str = $null
$modeld = $null

$str = "800G1"
$array = @("800"; "G1")

[array] $tokens = if     ($str -match '^(\d{3})([G]\d)$') 
{ $Matches.1, $Matches.2 }

#$tokens = $str -split '(?=\D)'
ForEach ($token in $Tokens)
{
$matchd = ($array -eq $token)
$strd = $matchd
$match = ($array -eq $str)

If ($matchd -ne $null)
            {
            $modeld = $modeld   (" "   $matchd)
            Write-host "Model Building: $modeld"
            }
}
Write-host "Model Built: $modeld"
  • Related