Home > Back-end >  How can I effectively parse text into groups?
How can I effectively parse text into groups?

Time:07-04

I have a set of pdf invoices that need to parsed into respectove columns/groups like below:

enter image description here

csv

To add text,

Data
28/06/22      7694759.0        Service Rear Lift 660L Cardboard                           1      $5.00            $5.00
30/06/22      7710113.0        Service Rear Lift 240L Co-Mingled           Bin Empty      1     $10.25           $10.25
28/06/22      7694756.0        Service Rear Lift 660L Putrescible                         2     $28.50           $57.00
30/06/22      7714583.0        Service Rear Lift 240L Putrescible                         1     $13.91           $13.91
Site: 111947.021 Rental Charge (H8.0 x 1 / RI) GRN (x 5)                              5       $10.00          $50.00
21/06/22     7648522.0   Futile Service Front Lift 1.5M mick-email  1   20  20
28/06/22     7686945.0   Removal Front Lift 4.5M Putrescible    full on removal 2   100 200
03/06/22     7537009.0  Service Rear Lift 240L Putrescible  JENNIFER    1   13.34   1.364

to be parsed into

Date            Site        Description                             Description_2   Qty Price   Total
28/06/22        7694759.0   Service Rear Lift 660L Cardboard                        1   5       5
30/06/22        7710113.0   Service Rear Lift 240L Co-Mingled       Bin Empty       1   10.25   10.25
28/06/22        7694756.0   Service Rear Lift 660L Putrescible                      2   28.5    57
30/06/22        7714583.0   Service Rear Lift 240L Putrescible                      1   13.91   13.91
                111947.021  Rental Charge (H8.0 x 1 / RI) GRN (x 5)                 5   10      50
21/06/22        7648522.0   Futile Service Front Lift 1.5M          mick-email      1   20      20
28/06/22        7686945.0   Removal Front Lift 4.5M Putrescible     full on removal 2   100     200
03/06/22        7537009.0   Service Rear Lift 240L Putrescible      JENNIFER        1   13.34   1.364

How can I perform the grouping in regex?

CodePudding user response:

You can use the following regex:

^([\d\/] )?\s (\d \.\d )\s (\S (?:\s\S )*)\s (\S (?:\s\S )*)?\s*([\d\.] )\s ([\d\.] )\s ([\d\.] )$

Regex Explanation:

  • ^: start of string
  • ([\d\/] )?: Group 1 (the date) - optional
  • \s : spaces
  • (\d \.\d ): Group 2 (the site)
  • \s : spaces
  • (\S (?:\s\S )*): Group 3 (the description)
  • \s : spaces
  • (\S (?:\s\S )*)?: Group 4 (the description 2) - optional
  • \s*: spaces - optional
  • ([\d\.] ): Group 5 (the quantity)
  • \s: spaces
  • ([\d\.] ): Group 6 (the price)
  • \s: spaces
  • ([\d\.] ): Group 7 (the total)
  • $: end of string

You can access the single groups by their group number.

Check the demo here.

  • Related