Regex need to consider two patterns in same group
sample data ::
mixexecutor:check_atom_exists:740 - requested to check this machine : **ET_colBackDDW_Temp**
output_of_reports/PII/36478_**ABP_BAL_liquidpressure**-**20210831-123456**-**20210831-172355**.bat.yz
packofexecutors:_to_signle_que:869-no mata for file'/private/external_control_time_mapped_low_volume/IBA/54378_BD-**RT_69-1**-1-**20200831-152355**-**20200831-172355**.dat.xz'
4.packofexecutors:_to_signle_que:869-no mata for file'/private/external_control_time_mapped_low_volume/IBA/54378_BD-**RT_69**-1-**20200831-152355**-**20200831-172355**.dat.xz'
5.mixexecutor:check_atom_exists:740 - requested to check this machine : **Eanes_colBack12_current**
6.packofexecutors._check_tar.587-nr of missed files=78 nr of skipped records=6547 nr of records not exist=0
7.packofexecutors._filter_mistacl_signals:777 - invalid atomname for **RT_6**:ESmotormeaninfAmkl
Both the data belongs to same column need to identity highlighted values
Expected output:
**ET_colBackDDW_Temp**
--> group 1**ABP_BAL_liquidpressure**
--> group 1,20210831-123456
--> group 2,20210831-172355
--> group 3
3.**RT_69-1**
--> group 1 ,20200831-152355
--> group 2, 20200831-172355
--> group 3
4.**RT_69**
--> group 1 , 20200831-152355
--> group 2, 20200831-172355
--> group 3
5.**Eanes_colBack12_current**
--> group 1
6.None
7.**RT_6**
I have tried like below while developing the regex no need to consider the words
(^.*?((?:[a-zA-Z0-9] _) [A-Z]\w )(?:-[0-9]{1,7})?)-([0-9]{8}-[0-9]{6})-([0-9]{8}-[0-9]{6}))?)
I am using PySpark to parse above regex
find below https://regex101.com/r/yEBUxX/1
CodePudding user response:
To get the values in 1 or 3 groups using a single pattern, you might use:
^.*?([A-Z]\w*_\w )(?:-([0-9]{8}-[0-9]{6})-([0-9]{8}-[0-9]{6}))?
The pattern matches:
^
Start of string.*?
Match as least a possible chars(
Capture group 1[A-Z]\w*_
Match A-Z and optional word chars and_
\w
Match 1 word chars
)
Close group 1(?:
Non capture group-
Match literally([0-9]{8}-[0-9]{6})
Capture group 2-
match-
([0-9]{8}-[0-9]{6})
Capture group 3
)?
Close non capture group and make it optional