Home > Back-end >  Extract field from a specific line with reg expressions
Extract field from a specific line with reg expressions

Time:10-07

I want to extract the time (the number between "in" and "ms") from the line that contains "Extracted" in the example below.

2022-01-16 13:35:30,591 CET INFO log.persistence 1103:0038 [ebx-scheduler-worker-1] Committed 1 changes in Java Cache; in 16 ms. 2022-01-16 13:35:27,049 CET INFO log.persistence 1103:0038 [ebx-rro-226-unique] Heart beat #148000: avgInterval=1007: maxInterval=2360 ms. 2022-01-11 20:28:10,324 CET INFO log.persistence 1103:0038 [ebx-boot] Extracted archive Archive_ZipFile[D:\Tomcat9054\webapps\ebx_common-resources\configuration\mima\export\MIMA\files\data\Directory_Configuration\Configuration\archives\Configuration.ebx]@26ec4785 in 16 ms. 2022-01-11 20:28:04,120 CET INFO log.persistence 1103:0038 [ebx-boot] Inserted 174 tableHolders pointing to home 18490 in 0 ms.

I already have the expression to select the line that I want but can't add the condition to extract the number between the two words...

^.*\bExtracted\b.*$

Would appreciate if anybody can help.

Thanks in advance

CodePudding user response:

Here's my answer

import re
str = ''' 2022-01-16 13:35:30,591 CET INFO log.persistence 1103:0038 [ebx-scheduler-worker-1] Committed 1 changes in Java Cache; in 16 ms. 2022-01-16 13:35:27,049 CET INFO log.persistence 1103:0038 [ebx-rro-226-unique] Heart beat #148000: avgInterval=1007: maxInterval=2360 ms. 2022-01-11 20:28:10,324 CET INFO log.persistence 1103:0038 [ebx-boot] Extracted archive Archive_ZipFile[D:\Tomcat9054\webapps\ebx_common-resources\configuration\mima\export\MIMA\files\data\Directory_Configuration\Configuration\archives\Configuration.ebx]@26ec4785 in 16 ms. 2022-01-11 20:28:04,120 CET INFO log.persistence 1103:0038 [ebx-boot] Inserted 174 tableHolders pointing to home 18490 in 0 ms.'''

res = re.search(r'(?:Extracted. ?)(?:in\s(\d )\sms)',str)
res.group(1)

Regex Explanation: (?: <regex_pattern>) means non-capturing group

\s means whitespace

\d capture digit one or more

(?:Extracted. ?) means from 'Extracted' get anything until

Basically get this regex capture 'Extracted' until it finds in <number> ms and get the <number>

CodePudding user response:

If you want to find the number in the same line, and you don't want to cross matching a dot followed by a space:

\bExtracted\b[^.]*(?:\.(?! )[^.]*)*\sin\s (\d )\s ms\b

Explanation

  • \bExtracted\b Match the word Extracted between word boundaries
  • [^.]* OPtionally match any char except a dot
  • (?:\.(?! )[^.]*)* Repeat matching a dot not followed by a space and then again any char except a dot
  • \sin\s Match in between whitespace chars
  • (\d ) Capture 1 digits in group 1
  • \s ms\b

Regex demo

  • Related