We have customers that do not mention our INV-prefix when they pay invoices so I want to add it to our MT940-bank statement file.
:20: :25:MHAZNL2AXXX/0376603160 :28C:102/ :60F:C220525EUR5000,
:61:2204130413C336,52NMSCTOPF2500472627//GBBR001SCT TOPF2500472627
:86:17136 17364 17382 032022102402 WONG TONG SARL 1 RUESOMETHINGPARIS SOGEFRPPXXX NOT PROVIDED
:61:2204200420C406,02NMSCTOPF2500479378//GBCJ005SCT TOPF2500479378
:86:17486 17586 17697 17813 0320221054201 WONG TONG SARL 1RUE SOMETHINGPARIS SOGEFRPPXXX NOTPROVIDED
I need it to be
:20: :25:MHAZNL2AXXX/0376603160 :28C:102/ :60F:C220525EUR5000,
:61:2204130413C336,52NMSCTOPF2500472627//GBBR001SCT TOPF2500472627
:86:INV17136 INV17364 INV17382 032022102402 WONG TONG SARL 1 RUESOMETHINGPARIS SOGEFRPPXXX NOT PROVIDED
:61:2204200420C406,02NMSCTOPF2500479378//GBCJ005SCT TOPF2500479378
:86:INV17486 INV17586 INV17697 INV17813 0320221054201 WONG TONG SARL 1RUE SOMETHINGPARIS SOGEFRPPXXX NOTPROVIDED
I use a switch statement to match other lines as well that is why I use switch.
switch -Regex -File c:\Temp\WONG.ged {
':86:. WING SENG. ' { $_.replace('([1234567]\d{4}[ ])', "INV$1") }
default { $_ } # unrelated line, pass through
}
This seems to work but the replace does not work (if I replace eg. WONG for TEST then it works fine). I added 1234567 to prevent it from matching too much but it still matches too much.
CodePudding user response:
There are a couple of issues here:
- You need to use the
-replace
operator that supports regex replace operation, not theString.Replace
method (see What's the difference between .replace and -replace in powershell?) - Your text does not contain
WING SENG
, so there is no match - When you define a backreference, you need to use a literal
$
, so either use single quotes around the replacement,'INV$1'
, or escape the backtick,"INV`$1"
.
This will yield the expected result:
switch -Regex -File c:\Temp\WONG.ged {
':86:. WONG TONG' { $_ -replace '[1-7]\d{4} ', 'INV$&' }
default { $_ } # unrelated line, pass through
}
Note the pattern looks a bit different: [1-7]
is shorter and leaner than a verbose [1234567]
and there is no need to enclose the whole pattern with a capturing group, you can refer to the whole match with the $&
backreference. Also, there is no need to add .
at the end of the regex.
There is an alternative solution:
(Get-Content $filepath) -replace '(\G(?!^)|^(?=:86:. WONG TONG))(.*?)([1-7]\d{4}(?!\S))', '$1$2INV$3'
See the regex demo. Details:
(\G(?!^)|^(?=:86:. WONG TONG))
- Group 1: either the end of the previous successful match (\G(?!^)
) or (|
) a string (here, line) start position (^
) that is immediately followed with:86:
, any one or more chars other than line feed char, as many as possible, and thenWON TONG
string(.*?)
- Group 2 ($2
): any zero or more chars other than line feed chars as few as possible([1-7]\d{4}(?!\S))
- Group 3 ($3
): a digit from1
to7
and then any four digits that are at the end of string or immediately followed with a whitespace.
(Get-Content $filepath)
reads the file line by line, so any lines that are not matched will be output as is, unaffected.