Regexp - Get the first and last part-CodePudding

Is there a way to get the first and last part of the below lines? I'm guessing that regexp is the way to go. Preferably with notepad

This doesn't have to be super optimized or anything. Its something that will be executed manually occasionally and if it runs for several minutes, that's fine.

If it's a big problem of handling both the 'TimeingLog' and 'HandleArticleWarningOnOrder' lines in the same regexp I can run two different regexp and combine the result.

I have used this regexp to find the lines in the first place, they are from a much bigger list with lots of rows that im not interested in. ^.{26}(HandleArticleWarningOnOrder -> -1.*|Timinglog.*)

Note that the lines can be longer or shorter than the below example

Input

2022-01-11 09:52:35.65 -> TimingLog -> 1: '69' -2: '434' -3: '434' -4: '434' -5: '509' -6: '509' -6.1: '509' -7: '588' -19: '588' -20: '588' -21: '5145' -22: '5202' -23: '5224' -24: '5233' -25: '5233'
2022-01-11 09:52:48.82 -> TimingLog -> 1: '47' -2: '213' -3: '213' -4: '213' -5: '269' -6: '269' -6.1: '269' -7: '298' -8: '298' -12: '380' -13: '380' -14: '6270' -15: '6328' -16: '6347' -17: '6356' -18: '6356'
2022-01-11 09:53:02.68 -> TimingLog -> 1: '23' -2: '54' -3: '54' -4: '54' -5: '65' -6: '65' -6.1: '65' -7: '76' -19: '76' -20: '76' -21: '4916' -22: '4982' -23: '5010' -24: '5015' -25: '5015'
2022-01-11 09:53:06.57 -> HandleArticleWarningOnOrder ->  -1: '160' -2: '223' -1: '223' -2: '285' -1: '285' -2: '671' -1: '671' -2: '816' -1: '816' -2: '970' -1: '970' -2: '1122' -3: '1122' -4: '1312' -5: '17766' -6: '17766'
2022-01-11 09:53:17.01 -> TimingLog -> 1: '140' -2: '527' -3: '527' -4: '527' -5: '671' -6: '671' -6.1: '671' -7: '737' -19: '737' -20: '737' -21: '5984' -22: '6163' -23: '6307' -24: '6339' -25: '6339'
2022-01-11 09:53:25.12 -> TimingLog -> 1: '25' -2: '85' -3: '85' -4: '85' -5: '108' -6: '108' -6.1: '108' -7: '117' -19: '117' -20: '117' -21: '7706' -22: '7880' -23: '8018' -24: '8110' -25: '8110'
2022-01-11 09:53:31.90 -> TimingLog -> 1: '51' -2: '210' -3: '210' -4: '210' -5: '269' -6: '269' -6.1: '269' -7: '324' -19: '324' -20: '324' -21: '6641' -22: '6675' -23: '6704' -24: '6711' -25: '6711'
2022-01-11 09:53:44.04 -> TimingLog -> 1: '27' -2: '121' -3: '121' -4: '121' -5: '202' -6: '202' -6.1: '202' -7: '215' -19: '215' -20: '215' -21: '6520' -22: '6566' -23: '6594' -24: '6604' -25: '6604'
2022-01-11 09:53:53.51 -> TimingLog -> 1: '72' -2: '275' -3: '275' -4: '275' -5: '302' -6: '302' -6.1: '302' -7: '327' -8: '327' -12: '413' -13: '413' -14: '7408' -15: '7571' -16: '7725' -17: '7731' -18: '7731'
2022-01-11 09:54:04.27 -> TimingLog -> 1: '22' -2: '72' -3: '72' -4: '72' -5: '86' -6: '86' -6.1: '86' -7: '105' -8: '105' -12: '147' -13: '147' -14: '5192' -15: '5223' -16: '5251' -17: '5269' -18: '5269'
2022-01-11 09:54:09.16 -> HandleArticleWarningOnOrder ->  -1: '91' -2: '188' -2.1: '188' -3: '188' -4: '351' -5: '18276' -6: '18276'
2022-01-11 09:54:12.80 -> TimingLog -> 1: '13' -2: '43' -3: '43' -4: '43' -5: '51' -6: '51' -6.1: '51' -7: '57' -8: '57' -12: '86' -13: '86' -14: '8024' -15: '8263' -16: '8430' -17: '8524' -18: '8524'
2022-01-11 09:54:21.30 -> TimingLog -> 1: '105' -2: '353' -3: '353' -4: '353' -5: '414' -6: '414' -6.1: '414' -7: '470' -8: '470' -12: '814' -13: '814' -14: '8172' -15: '8336' -16: '8449' -17: '8480' -18: '8480'
2022-01-11 09:54:34.02 -> HandleArticleWarningOnOrder ->  -1: '102' -2: '154' -2.1: '154' -3: '154' -4: '202' -5: '20106' -6: '20106'
...

Preferred Output

2022-01-11 09:52:35.65 -> TimingLog -> '5233'
2022-01-11 09:52:48.82 -> TimingLog -> '6356'
2022-01-11 09:53:02.68 -> TimingLog -> '5015'
2022-01-11 09:53:06.57 -> HandleArticleWarningOnOrder -> '17766'
2022-01-11 09:53:17.01 -> TimingLog -> '6339'
...

CodePudding user response：

You can use replace with a branch reset group for your specific matches:

^(.{26})(?|(HandleArticleWarningOnOrder ->)\h{2,}-1\b|(TimingLog ->)).*('\d ')

The pattern matches:

^ Start of string
(.{26}) Capture group 1, match 26 characters (You could consider making this pattern a bit more specific)
(?| Branch reset group
- (HandleArticleWarningOnOrder ->)\h{2,}-1\b Capture the text in group 2 followed by matching 2 or more spaces and -1 and a word boundary to prevent a partial word match
- | Or
- (TimingLog ->) Capture group 2, match literally
) Close branch reset group
.* Match the rest of the line
('\d ') Capture the last occurrence of 1 digits between single quotes in group 3

Perhaps a more simpler patter using \K and a single capture group:

^.{26}(?:HandleArticleWarningOnOrder ->(?=\h{2,}-1\b)|TimingLog ->)\K.*('\d ')

In the replacement use $1

Regex demo

CodePudding user response：

You can use this regex for the replacement of the string.

^(.*->)( *(-)?\d*(\.)?\d*: ('\d{1,}\'))*$

Regex Demo

And replace with this $1$5, capturing 1st and 5th group.