Home > Back-end >  Regular expression over EDI File
Regular expression over EDI File

Time:04-21

I have the following EDI file and need to filter the element LOC 11 but not the LOC 7 and I need all segments between them that the LOC Segment gets repeated but the segments between them not.

At the moment my regex looks like LOC[^L]*(?:L(?!OC)[^L]*)* but with that I get 4 results because it filters the loc 7 elemements too.

I only need the 2 results. Could you help me?

> NAD ST 14::92  Test' LOC 11 KOD23277::92' LOC 7 D77::92:Test' LIN 1  
> test AP:IN'IMD F  12::272:K
> RIPPsadasdRIEM'RFF ON:EN10514492'RFF AAN:501'
> DTM 171:20220309:102'RFF AIF:500'DTM 171:20220305:102'CTA SC 12414:test,
> test'COM [email protected]:EM'
> COM ? 49-561-490-4173:TE'COM ? 49-561-490-84173:FX' QTY 83:1000:PCE'
> QTY 70:66850:PCE'DTM 51:20080101:102'
> QTY 72:0:PCE'DTM 52:20080101:102'
> QTY 194:1000:PCE'DTM 50:20220224:102'
> RFF AAU:2143276'DTM 171:20220218:102'
> QTY 194:1000:PCE'DTM 50:20220202:102'
> RFF AAU:2138944'DTM 171:20220131:102'
> QTY 194:1000:PCE'DTM 50:20220105:102'
> RFF AAU:2138943'DTM 171:20220103:102' SCC 24'
> QTY 113:1000:PCE'DTM 2:20220412:102'
> QTY 113:1000:PCE'DTM 2:20220503:102'
> QTY 113:1000:PCE'DTM 64:20220530:102'DTM 63:20220605:102'
> QTY 113:1000:PCE'DTM 64:20220620:102'DTM 63:20220626:102'
> QTY 113:1000:PCE'DTM 64:20220711:102'DTM 63:20220717:102'
> QTY 113:1000:PCE'DTM 64:20220801:102'DTM 63:20220807:102' GEI 3 37'
> 
> NAD ST 14::92  test' LOC 11 KOD823226::92' LOC 7 D86::92:Test' LIN 2  
> test H:IN'IMD F  12::272:K
> RIPPRIEM'RFF ON:EN10662318'RFF AAN:266'DTM 171:20220309:102'
> RFF AIF:265'DTM 171:20220305:102'CTA SC 12414:test,
> test'COM [email protected]:EM'
> COM ? 49-561-490-4173:TE'COM ? 49-561-490-84173:FX' QTY 83:200:PCE'
> QTY 70:14319:PCE'DTM 51:20100101:102'
> QTY 72:0:PCE'DTM 52:20100101:102' QTY 194:200:PCE'DTM 50:20220126:102'
> RFF AAU:2146871'DTM 171:20220121:102'
> QTY 194:200:PCE'DTM 50:20211210:102'RFF AAU:2146914'DTM 171:20211209:102' QTY 194:200:PCE'DTM 50:20211129:102'RFF AAU:2139927'DTM 171:20211124:102'SCC 24'
> QTY 113:200:PCE'DTM 2:20220503:102'
> QTY 113:200:PCE'DTM 64:20220606:102'DTM 63:20220612:102'
> QTY 113:200:PCE'DTM 64:20220718:102'DTM 63:20220724:102'
> QTY 113:200:PCE'DTM 64:20220829:102'DTM 63:20220904:102'
> QTY 113:200:PCE'DTM 64:20221010:102'DTM 63:20221016:102'
> 
> UNT 142 1'UNZ 1 2756'

CodePudding user response:

You can use

LOC\ 11[^L]*(?:L(?!OC\ 11)[^L]*)*
LOC\ 11[\w\W]*?(?=LOC\ 11|$)

See the regex demo.

Details:

  • LOC\ 11 - LOC 11 string
  • [^L]*(?:L(?!OC\ 11)[^L]*)* - any text up to the first occurrence of LOC 11 substring (uses the unroll-the-loop principle).

Although the results you get with the two patterns above are identical, the first one is much faster provided there are not too many Ls that are not followed with 11.

  • Related