Home > Enterprise >  Regex get data after |
Regex get data after |

Time:06-30

This is my log

2022-06-29 12:45:04.652 | INFO     | product.views.product_view:list:28 - SEARCH | Humana papa 
2022-06-29 12:45:04.652 | INFO     | product.views.product_view:list:28 - SEARCH | iPhone 12

i want to get everything after Search | for example Humana papa | iphone 12

i am using regex , i try this code but its only get Humana and iphone r'SEARCH | (\w ).*'

import re
from collections import Counter

inp = """2022-06-29 12:45:04.652 | INFO     | product.views.product_view:list:28 - SEARCH | Humana
2022-06-29 12:45:04.652 | INFO     | product.views.product_view:list:28 - SEARCH | Car
2022-06-29 12:45:04.652 | INFO     | product.views.product_view:list:28 - SEARCH | Phone 12 pro
2022-06-29 12:45:04.652 | INFO     | product.views.product_view:list:28 - SEARCH | Humana papa """
terms = re.findall(r'SEARCH \| (\w ).*', inp)

print(Counter(terms))  # Counter({'Humana': 2, 'Car': 1, 'Phone': 1})
print(Counter(terms).most_common(1)[0])  # ('Humana', 2)

what is best way to get full word ?

CodePudding user response:

You can use

terms = re.findall(r'SEARCH\s*\|\s*(\S.*)', inp.strip())

Note the inp.strip() where the inp string is stripped from the intial/trailing whitespace.

The regex matches

  • SEARCH - a SEARCH word
  • \s*\|\s* - a | char enclosed with zero or more whitespaces
  • (\S.*) - Group 1: a non-whitespace and then the rest of the line.

Output:

>>> terms
['Humana', 'Car', 'Phone 12 pro', 'Humana papa']

See the regex demo.

  • Related