Home > Blockchain >  Regex need to consider for all patterns in same group
Regex need to consider for all patterns in same group

Time:12-01

Regex need to consider two patterns in same group

sample data ::

  1. mixexecutor:check_atom_exists:740 - requested to check this machine : **ET_colBackDDW_Temp**

  2. output_of_reports/PII/36478_**ABP_BAL_liquidpressure**-**20210831-123456**-**20210831-172355**.bat.yz

  3. packofexecutors:_to_signle_que:869-no mata for file'/private/external_control_time_mapped_low_volume/IBA/54378_BD-**RT_69-1**-1-**20200831-152355**-**20200831-172355**.dat.xz'

4.packofexecutors:_to_signle_que:869-no mata for file'/private/external_control_time_mapped_low_volume/IBA/54378_BD-**RT_69**-1-**20200831-152355**-**20200831-172355**.dat.xz'

5.mixexecutor:check_atom_exists:740 - requested to check this machine : **Eanes_colBack12_current**

6.packofexecutors._check_tar.587-nr of missed files=78 nr of skipped records=6547 nr of records not exist=0

7.packofexecutors._filter_mistacl_signals:777 - invalid atomname for **RT_6**:ESmotormeaninfAmkl

Both the data belongs to same column need to identity highlighted values

Expected output:

  1. **ET_colBackDDW_Temp** --> group 1

  2. **ABP_BAL_liquidpressure** --> group 1, 20210831-123456 --> group 2, 20210831-172355 --> group 3

3.**RT_69-1**--> group 1 ,20200831-152355 --> group 2, 20200831-172355 --> group 3

4.**RT_69** --> group 1 , 20200831-152355 --> group 2, 20200831-172355 --> group 3

5.**Eanes_colBack12_current** --> group 1

6.None

7.**RT_6**

I have tried like below while developing the regex no need to consider the words

(^.*?((?:[a-zA-Z0-9] _) [A-Z]\w )(?:-[0-9]{1,7})?)-([0-9]{8}-[0-9]{6})-([0-9]{8}-[0-9]{6}))?)

I am using PySpark to parse above regex

find below https://regex101.com/r/yEBUxX/1

CodePudding user response:

To get the values in 1 or 3 groups using a single pattern, you might use:

^.*?([A-Z]\w*_\w )(?:-([0-9]{8}-[0-9]{6})-([0-9]{8}-[0-9]{6}))?

The pattern matches:

  • ^ Start of string
  • .*? Match as least a possible chars
  • ( Capture group 1
    • [A-Z]\w*_ Match A-Z and optional word chars and _
    • \w Match 1 word chars
  • ) Close group 1
  • (?: Non capture group
    • - Match literally
    • ([0-9]{8}-[0-9]{6}) Capture group 2
    • - match -
    • ([0-9]{8}-[0-9]{6}) Capture group 3
  • )? Close non capture group and make it optional

Regex demo

  • Related