Home > other >  Powershell - Match multiple 5-digit substrings in a string
Powershell - Match multiple 5-digit substrings in a string

Time:05-21

We send sales invoices with a prefix of INV. Some customers pay the invoices without the prefix. I want to add the prefix in that case to have the ERP-system recognise these payments. An example of the date is below.

:61:2204210421C1339,57NMSCTOPF2510474511//GBBK031SCT TOPF2510474511
:86:RGT FACT 17133 TANQ BROERS SA/RGT39370 TANQ BROERS SA48 AVENUE D'ABCDE :61:2204270427C4808,37NMSCTOPF2520477320//GBJ6009SCT
TOPF2520477320 :86:RGT FACT 17274.17442.17546 TANQ BROERS SA/RGT39370 TANQ BROERS SA48 AVENUE D'ABCDE
:61:2203290329C5518,16NMSCTOPF2485471711//GBCJ001SCT TOPF2485471711 :86:RGT FACT.16794 16918 17079 TANQ BROERS SA/RGT39370 TANQ BROERS SA48 AVENUE D'ABCDE

I need the output to be:

:61:2204210421C1339,57NMSCTOPF2510474511//GBBK031SCT TOPF2510474511 :86:RGT FACT INV17133 TANQ BROERS SA/RGT39370 TANQ BROERS SA48 AVENUE D'ABCDE
:61:2204270427C4808,37NMSCTOPF2520477320//GBJ6009SCT TOPF2520477320
:86:RGT FACT INV17274.INV17442.INV17546 TANQ BROERS SA/RGT39370 TANQ BROERS SA48 AVENUE D'ABCDE :61:2203290329C5518,16NMSCTOPF2485471711//GBCJ001SCT TOPF2485471711 :86:RGT FACT.INV16794 INV16918 INV17079 TANQ BROERS SA/RGT39370 TANQ BROERS SA48 AVENUE D'ABCDE

I have made this script but it only matches the first invoicenumber. How do I match them all in a group?

(:61:[0-9]{1,6}[0-9]{4}C[0-9] \,[0-9]?[0-9]?)(NMSC. )(\r?\n:86:RGT FACT.{1})([\d]{5})

The payment description will always be similar but not always exactly like this. The order will be like this but I am not sure whether they allways use dots to seperate the invoicenumbers for example.

CodePudding user response:

@Thefourthbird gave the answer in the comments, it works perfectly and I used following script to adjust the MT940 statement and put it to Production:

(Get-Content -Path $fileName -Raw) -replace '(?:(:61:[0-9]{1,6}[0-9]{4}C[0-9] ,[0-9]{0,2})(NMSC. )(\r?\n:86:RGT FACT)|\G(?!^))[ .](\d{5})' ,'$1$2$3 INV$4' | Set-Content  -Path $fileName

It takes the MT940 text file and puts INV in front of a 5 digit number if the string starts with :86:RGT FACT.

CodePudding user response:

You might use the \G anchor to get contiguous matches for the last 5 digits separated by either a space or dot.

Note that you can omit {1} from the pattern, and omit the . after FACT as that becomes part of the repeating using \G

(?:(:61:[0-9]{1,6}[0-9]{4}C[0-9] ,[0-9]{0,2})(NMSC. )(\r?\n:86:RGT FACT)|\G(?!^))[ .](\d{5})

Explanation

  • (?: Non capture group
    • (:61:[0-9]{1,6}[0-9]{4}C[0-9] ,[0-9]{0,2})(NMSC. )(\r?\n:86:RGT FACT) Your initial pattern
    • | OR
    • \G(?!^) Assert the position at the end of the previous match (not at the start)
  • ) Close the alternation
  • [ .](\d{5}) Match either a space or dot and capture 5 digits

See a Powershell demo and a regex demo.

Example

$input = @"
:61:2204210421C1339,57NMSCTOPF2510474511//GBBK031SCT TOPF2510474511
:86:RGT FACT 17133 TANQ BROERS SA/RGT39370 TANQ BROERS SA48 AVENUE D'ABCDE
:61:2204270427C4808,37NMSCTOPF2520477320//GBJ6009SCT TOPF2520477320
:86:RGT FACT 17274.17442.17546 TANQ BROERS SA/RGT39370 TANQ BROERS SA48 AVENUE D'ABCDE
:61:2203290329C5518,16NMSCTOPF2485471711//GBCJ001SCT TOPF2485471711
:86:RGT FACT.16794 16918 17079 TANQ BROERS SA/RGT39370 TANQ BROERS SA48 AVENUE D'ABCDE
"@

$input -replace '(?:(:61:[0-9]{1,6}[0-9]{4}C[0-9] ,[0-9]{0,2})(NMSC. )(\r?\n:86:RGT FACT)|\G(?!^))[ .](\d{5})' ,'$1$2$3 INV$4'

Output

:61:2204210421C1339,57NMSCTOPF2510474511//GBBK031SCT TOPF2510474511
:86:RGT FACT INV17133 TANQ BROERS SA/RGT39370 TANQ BROERS SA48 AVENUE D'ABCDE
:61:2204270427C4808,37NMSCTOPF2520477320//GBJ6009SCT TOPF2520477320
:86:RGT FACT INV17274 INV17442 INV17546 TANQ BROERS SA/RGT39370 TANQ BROERS SA48 AVENUE D'ABCDE
:61:2203290329C5518,16NMSCTOPF2485471711//GBCJ001SCT TOPF2485471711
:86:RGT FACT INV16794 INV16918 INV17079 TANQ BROERS SA/RGT39370 TANQ BROERS SA48 AVENUE D'ABCDE
  • Related