Home > Blockchain >  Powershell clean String data using words in a defined list
Powershell clean String data using words in a defined list

Time:12-26

$ignoreList =  @("muzi","puzi")

$data = "
blabla aa 11
blabla bb 22
muzi aa 20
muzi bb aa
aaa aa 41
blabla aa 20
puzi aa 11
puzi bb 32
puzi cc 44"

i need to create new data where it hold all the data except the onces that are also in the ignore list

#i can iterate the list and run a loop, get $str to be the item in the list and 
#and then save each time
$data | where-object {$_ -notlike $str}

I figure there's some better option than iterating the list abd savubg each time

CodePudding user response:

-like can only handle one pattern (wildcard expression) at the time.

To match against multiple patterns in a single operation, you have two options:

  • Use the regex-based -notmatch operator with an alternation expression (|), which requires you to escape the ignore words with [regex]::Escape() in order for them to be used verbatim as part of the regex (not strictly necessary with your specific search terms, so in this simple case you could get away with '^(?:{0})' -f ($ignoreList -join '|')); use of a regex also allows you to assert that each ignore word must be found at the start of each string (^):
$ignoreList =  @("muzi","puzi")

# Create an *array* of sample lines.
$data = @'
blabla aa 11
blabla bb 22
muzi aa 20
muzi bb aa
aaa aa 41
blabla aa 20
puzi aa 11
puzi bb 32
puzi cc 44"
'@ -split '\r?\n'

# The programmatically created regex results in:
#    '^(?:muzi|puzi)'
# The ?: part isn't strictly necessary, but makes the (...) group
# non-capturing, which prevents unnecessary work.
$data -notmatch ('^(?:{0})' -f ($ignoreList.ForEach({ [regex]::Escape($_) }) -join '|'))
  • Use the Select-String cmdlet with multiple patterns (though you may use a single one with alternation too), which may be literal search terms if you add -SimpleMatch. This approach is simpler, but slower, due to use of the pipeline:
# Note the need to use (...).Line to extract the matching strings.
# In PowerShell (Core) 7  you could use -Raw instead.
($data | Select-String -Pattern $ignoreList -SimpleMatch -NotMatch).Line
  • Related