Home > OS >  Can a Regex capture multiple similar pattern in a text block?
Can a Regex capture multiple similar pattern in a text block?

Time:10-30

I'm trying to extract specific words from text blocks. The blocks themselves look like that.

soundeffect = {
    name = event_ship_explosion
    sounds = {
        sound = event_ship_explosion
    }  
    volume = 0.2
    max_audible = 1
    max_audible_behaviour = fail
}

soundeffect = {
    name = event_ship_bridge
    sounds = {
        sound = event_ship_bridge
        sound = event_ship_bridge_02
        sound = event_ship_bridge_03
    }  
    volume = 0.3
    max_audible = 1
    max_audible_behaviour = fail        
}

They are quite similar but the main difference is that inside a sounds block, there can be multiple instance of sound. Whether there is a single or multiple instance of sound, I want to be able to extract all of them.

Basically, for each soundeffect blocks, I want to extract a pair of words containing what's on the right of name = and what's on the right of sound = (Multiple time, if applicable).

I came up with this RegEx so far and when testing it (on websites like Regex101 result

The RegEx: name\s*=\s*(?<Group>\w )\s*sounds\s*=\s*{\s*(?>sound\s*=\s*(?<Sound>\w )\s*) }

However, when using PowerShell, only the last instance of what's to the right of sound = is retained. The previous matches are overridden. To transform all of that in an array of objects that I can then exploit (Using ConvertFrom-Text - https://github.com/jdhitsolutions/PSScriptTools/blob/master/docs/ConvertFrom-Text.md). If you take the second soundeffect block, this is the result I get:

Group                                 Sound
-----                                 -----
event_ship_bridge                     event_ship_bridge_03

When I'm looking for something like that instead:

Group                                 Sound
-----                                 -----
event_ship_bridge                     @(event_ship_bridge, event_ship_bridge_02, event_ship_bridge_03}

I'm finally wondering if what I want to do is even possible with RegEx so I'm interested by all inputs you can give me. :)

Edit: here's the PowerShell code I'm using at the moment:

$Text | Select-String -Pattern $Regex -AllMatches | ForEach-Object {
    [string]$GroupName = $_.Matches.Groups[$Regex.GroupNumberFromName("Group")] | % { $_.Value }
    [string]$SoundName = $_.Matches.Groups[$Regex.GroupNumberFromName("Sound")] | % { $_.Value }

}

$GroupName
$SoundName

Which returns the following result as if the -AllMatches parameter was not respected.

event_ship_explosion
event_ship_explosion

CodePudding user response:

There seems to be three problems in your PowerShell code.

  1. Since there are multiple matches, you need another loop to go through all of them.

  2. The "Sound" group has multiple captures, so you need to look at the Captures member to retrieve them all.

  3. You're writing the results to variables rather than returning them. The variables will of course get overwritten in every iteration of the loop, so reading them outside the loop will only give you the last value.

Try something like this:

PS > $Text | Select-String -Pattern $Regex -AllMatches | ForEach-Object { $_.Matches } | ForEach-Object {
>>     [pscustomobject]@{
>>         GroupName = $_.Groups[$Regex.GroupNumberFromName("Group")].Value
>>         SoundName = $_.Groups[$Regex.GroupNumberFromName("Sound")].Captures.Value
>>     }
>> }
>>

GroupName            SoundName
---------            ---------
event_ship_explosion event_ship_explosion
event_ship_bridge    {event_ship_bridge, event_ship_bridge_02, event_ship_bridge_03}

CodePudding user response:

A slightly different approach (I am pretty new to powershell):

$txt = Get-Content('./soundeffect.txt')

$txt | 
   select-String -Pattern "name =.*|sound =.*" | 
   ForEach-Object { $a = $_.Line.Trim().Split("="); 
                    if($a[0].Trim() -eq "name") { $soundeffect=$a[1].Trim() }
                    if($a[0].Trim() -eq "sound") { 
                       $sound=$a[1].Trim() 
                       "$soundeffect,  $sound"
                       }
                    $sound = ""
                  }

output:

event_ship_explosion,  event_ship_explosion
event_ship_bridge,  event_ship_bridge
event_ship_bridge,  event_ship_bridge_02
event_ship_bridge,  event_ship_bridge_03
  • Related