Home > Software engineering >  Regex captures commented rows, but nothing when try to remove commented rows
Regex captures commented rows, but nothing when try to remove commented rows

Time:02-22

I'm trying to re-use a regex I'm using to parse another file. This file has some commented rows, and I need to make sure the commented rows aren't captured.

This is the string being parsed:

m_dwErrorCode = 0; 
m_dwOutError = HOP_OK;  
m_OutSeverity = CCC_INFORMATION;  
_stprintf(m_OutDevStr, _T(""));
if (0x00000000 & value)
{
    
    m_dwErrorCode = 0x0;
    /* Ready state. */
    // m_StatusStr = " Ready(eSTATUS_READY)";
}
if (0x00000001 & value)
{
    m_bProceeding = true;
    /* proceed */
    //m_StatusStr = " Proceeding(eSTATUS)";
}
if (0x00002000 & value)
{
    m_bEmpty = true;
    // We only want to check this error only at certain times.
    if (m_bCheckEmpty)
    {
        if ((m_Attributes.dwMediaID == CUBE1) ||
            (m_Attributes.dwMediaID == CUBE2) ||
            /*(m_Attributes.dwMediaID == SCALLOPED) ||*/ // Added
            (m_Attributes.dwMediaID == FOLDED))
        {

            m_dwErrorCode = 0x00002000;
            _stprintf(m_OutDevStr, _T("0x1000 - %s(MP Tray Empty)"), errorStr);
            m_dwOutError = HOP_TRAY_EMPTY;
            m_OutSeverity = CCC_INFORMATION;
        }
    }
    //HOP_TRAY_EMPTY
    ///* MSI empty. */
    //m_bTrayEmpty = true;
    //// m_StatusStr = " MSI empty(eSTATUS_MSI_EMPTY)";
}
if (0x00004000 & value)
{
    /* empty. */
    m_dwErrorCode = 0x4000;
    _stprintf(m_OutDevStr, _T("0x4000 - %s(Tray 1 empty)"), errorStr);
    m_dwOutError = HOP_TRAY_01_EMPTY;
    m_OutSeverity = CCC_INFORMATION;
}
if (0x00008000 & value)
{
    /* Tray 2 empty. */
    m_dwErrorCode = 0x8000;
    _stprintf(m_OutDevStr, _T("0x8000 - %s(Tray 2 empty)"), errorStr);
    m_dwOutError = HOP_TRAY_02_EMPTY;
    m_OutSeverity = CCC_INFORMATION;
}
if (0x00010000 & value)
{
    /* Tray 3 empty. */
    m_dwErrorCode = 0x10000;
    _stprintf(m_OutDevStr, _T("0x10000 - %s(Tray 3 empty)"), errorStr);
    m_dwOutError = HOP_TRAY_03_EMPTY;
    m_OutSeverity = CCC_INFORMATION;
}

This is the code that gets it mostly right, except it captures the commented rows:

Function Get-CaseContents3240{
    [cmdletbinding()]
    Param ( [string]$parsedCaseMethod)
    Process
    {
 # construct regex
       $fullregex = [regex]"_stprintf[\s\S]*?_T\D*", # Start of error message, capture until digits       
      "(?<sdkErr>[x\d] )",       # Error number, digits only with x
      "\D[\s\S]*?",           # match anything, non-greedy
      "(?<sdkDesc>\((. ?)\))", # Error description, anything within parentheses, non-greedy
      "([\s\S]*?OutError\s*=(?<sdkOutErr>\s[a-zA-Z_0-9]*))", # Capture OutErr string 
      "[\s\S]*?",             # match anything, non-greedy
      "(?<sdkSeverity>OutSeverity\s*=\s[a-zA-Z_]*)", # Capture severity string and parse out part after underscore later
      '' -join ''
      
      # run the regex on the method contents
      $Values = $parsedCaseMethod | Select-String -Pattern $fullregex -AllMatches
  
      # Convert Name-Value pairs to object properties
      $result = foreach ($match in $Values.Matches){
        [PSCustomObject][ordered]@{
          sdkErr      = $($match.Groups['sdkErr'])
          sdkDesc     = $($match.Groups['sdkDesc'])
          sdkOutErr   = $($match.Groups['sdkOutErr'])
          sdkSeverity = ($match.Groups['sdkSeverity'] -split '_')[-1] #take part after _
        }
      }
  
      #add in content that doesn't fall in pattern###################


      #Write-Host "result:" $result -ForegroundColor Green
      #$result;
      return $result
       
    }#End of Process
  }#End of Function

This is what the results look like:

[Object[17]]
[0]:@{sdkErr=0x; sdkDesc=(tmpStr);sdkOutErr=HOP_OK;sdkSeverity=INFORMATION}
...

As you can see, the first one is picking up the commented out lines.

I tried doing this with the first regex line to fix it, but when I do that, the result set is empty:

^[\s] _stprintf[\s\S]*?_T\D*

This is the expected results:

sdkErr=0x1000                         ###missed this before
sdkDesc=MP Tray Empty
sdkOutErr=HOP_TRAY_EMPTY
sdkSeverity=INFORMATION

sdkErr=0x4000 
sdkDesc=Tray 1 empty
sdkOutErr=HOP_TRAY_01_EMPTY
sdkSeverity=INFORMATION

sdkErr=0x8000
sdkDesc=Tray 2 empty
sdkOutErr=HOP_TRAY_02_EMPTY
sdkSeverity=INFORMATION

sdkErr=0x10000
sdkDesc=Tray 3 empty
sdkOutErr=HOP_TRAY_03_EMPTY
sdkSeverity=INFORMATION
...

This is with PowerShell 5.1 and VS Code.

Update:

I'd like to keep the same data structure returned, just so everything is the same after the Function as what I have for other devices.

CodePudding user response:

It might be more maintainable to break it down into individual "if" blocks with one regex, and then parse each block in a second pass...

$code = Get-Content "myfile.c" -Raw;

# split into separate "if" blocks.
# (the funky "(?=...)" preserves the delimiter)
$blocks = $code -split "(?=if \(.* \& value\))";
# e.g.
# if (0x00004000 & value)
# {
#     /* empty. */
#     m_dwErrorCode = 0x4000;
#     _stprintf(m_OutDevStr, _T("0x4000 - %s(Tray 1 empty)"), errorStr);
#     m_dwOutError = HOP_TRAY_01_EMPTY;
#     m_OutSeverity = CCC_INFORMATION;
# }

$pattern = `
    "_stprintf[\s\S]*?_T\D*"  
    "(?<sdkErr>[x\d] )"  
    "\D[\s\S]*?"  
    "(?<sdkDesc>\((. ?)\))"  
    "[\s\S]*?"  
    "(OutError\s*=\s*(?<sdkOutErr>[a-zA-Z_0-9]*))"  
    "[\s\S]*?"  
    "(?<sdkSeverity>OutSeverity\s*=\s[a-zA-Z_]*)";

# note - skip first block as it's the preamble before the first "if"
$blocks `
    | select-object -skip 1 `
    | select-string -pattern $pattern `
    | foreach-object {
         $match = $_.Matches[0];
         [PSCustomObject] [ordered] @{
              "sdkErr"      = $match.Groups['sdkErr']
              "sdkDesc"     = $match.Groups['sdkDesc']
              "sdkOutErr"   = $match.Groups['sdkOutErr']
              "sdkSeverity" = ($match.Groups['sdkSeverity'] -split '_')[-1]
        }
    };

Output is:

sdkErr  sdkDesc         sdkOutErr         sdkSeverity
------  -------         ---------         -----------
0x1000  (MP Tray Empty) HOP_TRAY_EMPTY    INFORMATION
0x4000  (Tray 1 empty)  HOP_TRAY_01_EMPTY INFORMATION
0x8000  (Tray 2 empty)  HOP_TRAY_02_EMPTY INFORMATION
0x10000 (Tray 3 empty)  HOP_TRAY_03_EMPTY INFORMATION

CodePudding user response:

Not a robust solution, it does work for the code currently posted but I do not assure this will work with the actual code you might test it on.

The regex expects a single string, hence, when testing this with your file, make sure you're using the -Raw switch.

See https://regex101.com/r/l0RLPw/1 for details.

$re = [regex]@'
(?xsi)
    _stprintf\([\w_,\s] \("(?<code>\dx\d )\s*
    -.*?\((?<description>[\w\s] )\)"\).*?;\s*
    m_dwOutError\s*=\s*(?<error>[\w_] );\s*
    m_OutSeverity\s*=\s*\w*?_(?<severity>\w )
'@

$content = Get-Content path/to/content.ext -Raw

foreach($match in $re.Matches($content)) {
    [pscustomobject]@{
        sdkErr      = $match.Groups['code']
        sdkDesc     = $match.Groups['description']
        sdkOutErr   = $match.Groups['error']
        sdkSeverity = $match.Groups['severity']
    }
}

Result looks like this for me:

sdkErr  sdkDesc       sdkOutErr         sdkSeverity
------  -------       ---------         -----------
0x1000  MP Tray Empty HOP_TRAY_EMPTY    INFORMATION
0x4000  Tray 1 empty  HOP_TRAY_01_EMPTY INFORMATION
0x8000  Tray 2 empty  HOP_TRAY_02_EMPTY INFORMATION
0x10000 Tray 3 empty  HOP_TRAY_03_EMPTY INFORMATION
  • Related