I want to extract the time (the number between "in" and "ms") from the line that contains "Extracted" in the example below.
2022-01-16 13:35:30,591 CET INFO log.persistence 1103:0038 [ebx-scheduler-worker-1] Committed 1 changes in Java Cache; in 16 ms. 2022-01-16 13:35:27,049 CET INFO log.persistence 1103:0038 [ebx-rro-226-unique] Heart beat #148000: avgInterval=1007: maxInterval=2360 ms. 2022-01-11 20:28:10,324 CET INFO log.persistence 1103:0038 [ebx-boot] Extracted archive Archive_ZipFile[D:\Tomcat9054\webapps\ebx_common-resources\configuration\mima\export\MIMA\files\data\Directory_Configuration\Configuration\archives\Configuration.ebx]@26ec4785 in 16 ms. 2022-01-11 20:28:04,120 CET INFO log.persistence 1103:0038 [ebx-boot] Inserted 174 tableHolders pointing to home 18490 in 0 ms.
I already have the expression to select the line that I want but can't add the condition to extract the number between the two words...
^.*\bExtracted\b.*$
Would appreciate if anybody can help.
Thanks in advance
CodePudding user response:
Here's my answer
import re
str = ''' 2022-01-16 13:35:30,591 CET INFO log.persistence 1103:0038 [ebx-scheduler-worker-1] Committed 1 changes in Java Cache; in 16 ms. 2022-01-16 13:35:27,049 CET INFO log.persistence 1103:0038 [ebx-rro-226-unique] Heart beat #148000: avgInterval=1007: maxInterval=2360 ms. 2022-01-11 20:28:10,324 CET INFO log.persistence 1103:0038 [ebx-boot] Extracted archive Archive_ZipFile[D:\Tomcat9054\webapps\ebx_common-resources\configuration\mima\export\MIMA\files\data\Directory_Configuration\Configuration\archives\Configuration.ebx]@26ec4785 in 16 ms. 2022-01-11 20:28:04,120 CET INFO log.persistence 1103:0038 [ebx-boot] Inserted 174 tableHolders pointing to home 18490 in 0 ms.'''
res = re.search(r'(?:Extracted. ?)(?:in\s(\d )\sms)',str)
res.group(1)
Regex Explanation:
(?: <regex_pattern>)
means non-capturing group
\s
means whitespace
\d
capture digit one or more
(?:Extracted. ?)
means from 'Extracted' get anything until
Basically get this regex capture 'Extracted' until it finds in <number> ms
and get the <number>
CodePudding user response:
If you want to find the number in the same line, and you don't want to cross matching a dot followed by a space:
\bExtracted\b[^.]*(?:\.(?! )[^.]*)*\sin\s (\d )\s ms\b
Explanation
\bExtracted\b
Match the wordExtracted
between word boundaries[^.]*
OPtionally match any char except a dot(?:\.(?! )[^.]*)*
Repeat matching a dot not followed by a space and then again any char except a dot\sin\s
Matchin
between whitespace chars(\d )
Capture 1 digits in group 1\s ms\b