Home > Blockchain >  How to use OR operator in Python Regex?
How to use OR operator in Python Regex?

Time:04-28

I have two different type of OS, which return the service tomcat status output differently.

I have a method to get the tomcat process ID.

def find_pid(pattern,status):
    m = re.match(pattern, status, re.DOTALL)
    pid = m.groups(0)[0]
    return(pid)
print(find_pid(pattern,status))

OS type1 service tomcat status returns

status = 'jsvc (pid  2164) is running...'

Pattern used to get PID

pattern = '.*pid\s (\d ).*running.*'
print(find_pid(pattern,status))           >>> 2164

OS type2 service tomcat status returns

status = '''tomcat.service - Tomcat Server
   Loaded: loaded ( enabled; vendor preset: enabled)
   Active: active (running) since Thu 2021-11-18 13:33:00 PST; 1min 47s ago
 Main PID: 2800 (jsvc)
   CGroup: /system/tomcat.service
'''

Pattern used to get PID

pattern = '.*PID:\s (\d ).*'
print(find_pid(pattern,status))          >>> 2800

Now I don't want to check the type of OS and I dont want to send different pattern for different OS.

The pattern should be able to get the PID irrespective of which type of OS we are checking,

I created a method by combining above two patterns with a '|' operator.

def find_pid(status):
    pattern = '.*pid\s (\d ).*running.*|.*PID:\s (\d ).*'
    m = re.match(pattern, status, re.DOTALL)
    pid = m.groups(0)[0]
    return(pid)
print(find_pid(status))

This method is not working for either of the status.

I need a pattern which would match both type of status and return me the PID from it.

PS: If possible, I need the solution which would work both in python 2 and 3. Because Few test vms run in python2 and few run in python3 environment. (working on porting 2 to 3).

CodePudding user response:

You could use a case insensitive match, flag re.I, :? since not both strings have a : before the PID and (\d{1,5}) to group the digits (max of 5, assumed os-limits).

import re

os1_status = 'jsvc (pid  2164) is running...'
os2_status = '''tomcat.service - Tomcat Server
   Loaded: loaded ( enabled; vendor preset: enabled)
   Active: active (running) since Thu 2021-11-18 13:33:00 PST; 1min 47s ago
 Main PID: 2800 (jsvc)
   CGroup: /system/tomcat.service
'''

for status in (os1_status, os2_status):
    match = re.search(r'pid:?\s (\d{1,5})', status, re.I)
    print(match.group(1))

CodePudding user response:

Taking in account test cases:

jsvc (pid  2164) is running...

and

tomcat.service - Tomcat Server
   Loaded: loaded ( enabled; vendor preset: enabled)
   Active: active (running) since Thu 2021-11-18 13:33:00 PST; 1min 47s ago
 Main PID: 2800 (jsvc)
   CGroup: /system/tomcat.service

I suggest using following universal pattern

r".*[Pp][Ii][Dd]:?\s (\d )"

Note that I used [Pp][Ii][Dd] to accept any case without need to use flags, if you are in control of flags you might use re.IGNORECASE instead, e.g.

import re
test1 = '''jsvc (pid  2164) is running...'''
test2 = '''tomcat.service - Tomcat Server
   Loaded: loaded ( enabled; vendor preset: enabled)
   Active: active (running) since Thu 2021-11-18 13:33:00 PST; 1min 47s ago
 Main PID: 2800 (jsvc)
   CGroup: /system/tomcat.service'''
print(re.search(r".*pid:?\s (\d )", test1, re.IGNORECASE).group(1))
print(re.search(r".*pid:?\s (\d )", test2, re.IGNORECASE).group(1))

output

2164
2800
  • Related