This is my log
2022-06-29 12:45:04.652 | INFO | product.views.product_view:list:28 - SEARCH | Humana papa
2022-06-29 12:45:04.652 | INFO | product.views.product_view:list:28 - SEARCH | iPhone 12
i want to get everything after Search |
for example
Humana papa |
iphone 12
i am using regex , i try this code but its only get Humana and iphone r'SEARCH | (\w ).*'
import re
from collections import Counter
inp = """2022-06-29 12:45:04.652 | INFO | product.views.product_view:list:28 - SEARCH | Humana
2022-06-29 12:45:04.652 | INFO | product.views.product_view:list:28 - SEARCH | Car
2022-06-29 12:45:04.652 | INFO | product.views.product_view:list:28 - SEARCH | Phone 12 pro
2022-06-29 12:45:04.652 | INFO | product.views.product_view:list:28 - SEARCH | Humana papa """
terms = re.findall(r'SEARCH \| (\w ).*', inp)
print(Counter(terms)) # Counter({'Humana': 2, 'Car': 1, 'Phone': 1})
print(Counter(terms).most_common(1)[0]) # ('Humana', 2)
what is best way to get full word ?
CodePudding user response:
You can use
terms = re.findall(r'SEARCH\s*\|\s*(\S.*)', inp.strip())
Note the inp.strip()
where the inp
string is stripped from the intial/trailing whitespace.
The regex matches
SEARCH
- aSEARCH
word\s*\|\s*
- a|
char enclosed with zero or more whitespaces(\S.*)
- Group 1: a non-whitespace and then the rest of the line.
Output:
>>> terms
['Humana', 'Car', 'Phone 12 pro', 'Humana papa']
See the regex demo.