Home > Net >  Powershell - Replace multiple 5-digit substrings in a string
Powershell - Replace multiple 5-digit substrings in a string

Time:06-03

We have customers that do not mention our INV-prefix when they pay invoices so I want to add it to our MT940-bank statement file.

:20: :25:MHAZNL2AXXX/0376603160 :28C:102/ :60F:C220525EUR5000,
:61:2204130413C336,52NMSCTOPF2500472627//GBBR001SCT TOPF2500472627
:86:17136 17364 17382 032022102402 WONG TONG SARL 1 RUESOMETHINGPARIS SOGEFRPPXXX NOT PROVIDED
:61:2204200420C406,02NMSCTOPF2500479378//GBCJ005SCT TOPF2500479378
:86:17486 17586 17697 17813 0320221054201 WONG TONG SARL 1RUE SOMETHINGPARIS SOGEFRPPXXX NOTPROVIDED

I need it to be

:20: :25:MHAZNL2AXXX/0376603160 :28C:102/ :60F:C220525EUR5000,
:61:2204130413C336,52NMSCTOPF2500472627//GBBR001SCT TOPF2500472627
:86:INV17136 INV17364 INV17382 032022102402 WONG TONG SARL 1 RUESOMETHINGPARIS SOGEFRPPXXX NOT PROVIDED
:61:2204200420C406,02NMSCTOPF2500479378//GBCJ005SCT TOPF2500479378
:86:INV17486 INV17586 INV17697 INV17813 0320221054201 WONG TONG SARL 1RUE SOMETHINGPARIS SOGEFRPPXXX NOTPROVIDED

I use a switch statement to match other lines as well that is why I use switch.

switch -Regex -File c:\Temp\WONG.ged  {
    ':86:. WING SENG. '   { $_.replace('([1234567]\d{4}[ ])', "INV$1") }
    default               { $_ } # unrelated line, pass through
    } 

This seems to work but the replace does not work (if I replace eg. WONG for TEST then it works fine). I added 1234567 to prevent it from matching too much but it still matches too much.

CodePudding user response:

There are a couple of issues here:

  • You need to use the -replace operator that supports regex replace operation, not the String.Replace method (see What's the difference between .replace and -replace in powershell?)
  • Your text does not contain WING SENG, so there is no match
  • When you define a backreference, you need to use a literal $, so either use single quotes around the replacement, 'INV$1', or escape the backtick, "INV`$1".

This will yield the expected result:

switch -Regex -File c:\Temp\WONG.ged {
 ':86:. WONG TONG'   { $_ -replace '[1-7]\d{4} ', 'INV$&' }
 default             { $_ } # unrelated line, pass through
}

Note the pattern looks a bit different: [1-7] is shorter and leaner than a verbose [1234567] and there is no need to enclose the whole pattern with a capturing group, you can refer to the whole match with the $& backreference. Also, there is no need to add . at the end of the regex.

There is an alternative solution:

(Get-Content $filepath) -replace '(\G(?!^)|^(?=:86:. WONG TONG))(.*?)([1-7]\d{4}(?!\S))', '$1$2INV$3'

See the regex demo. Details:

  • (\G(?!^)|^(?=:86:. WONG TONG)) - Group 1: either the end of the previous successful match (\G(?!^)) or (|) a string (here, line) start position (^) that is immediately followed with :86:, any one or more chars other than line feed char, as many as possible, and then WON TONG string
  • (.*?) - Group 2 ($2): any zero or more chars other than line feed chars as few as possible
  • ([1-7]\d{4}(?!\S)) - Group 3 ($3): a digit from 1 to 7 and then any four digits that are at the end of string or immediately followed with a whitespace.

(Get-Content $filepath) reads the file line by line, so any lines that are not matched will be output as is, unaffected.

  • Related