Home > Mobile >  Python - Extract and Split string using regex with position
Python - Extract and Split string using regex with position

Time:05-08

I have a string like this.

'2022-05-08T04:18:43Z UTC [ db=dev user=test pid=1073922359 userid=100 xid=2063 ]' LOG: alter table my_table alter column string type varchar(16);

I want to extract the following items using regex. Tried a couple of pattens, but didnt work.

2022-05-08T04:18:43Z - Timestamp

dev - Database name

test - Database user

alter table my_table alter column string type varchar(16) - Query

CodePudding user response:

Based on the format you have given, this regex should do what you want.

'(?P<ts>\d{4}-\d\d-\d\dT\d\d:\d\d:\d\dZ).*?\bdb=(?P<db>\S*).*?\buser=(?P<user>\S*).*?LOG:\s (?P<query>.*?);?$

It uses named capture groups to match the timestamp, database, user and query parts of the string. The timestamp is assumed to be in the format in your string i.e. yyyy-mm-ddThh:mm:ssZ; the database and user fields are any non-space characters after the appropriate tag, and the query is everything from LOG: until a trailing ; and end of line.

Regex demo

You can use that in python like this:

import re

str = "'2022-05-08T04:18:43Z UTC [ db=dev user=test pid=1073922359 userid=100 xid=2063 ]' LOG: alter table my_table alter column string type varchar(16);"
m = re.match(r"'(?P<ts>\d{4}-\d\d-\d\dT\d\d:\d\d:\d\dZ).*?\bdb=(?P<db>\S*).*?\buser=(?P<user>\S*).*?LOG:\s (?P<query>.*?);?$", str)
if m is not None:
    print(m.groupdict())

Output:

{
 'ts': '2022-05-08T04:18:43Z',
 'db': 'dev',
 'user': 'test',
 'query': 'alter table my_table alter column string type varchar(16)'
}

Note this regex assumes the db tag comes before the user tag. If this might not be the case, you can workaround that by putting those capture groups inside forward lookaheads e.g.

'(?P<ts>\d{4}-\d\d-\d\dT\d\d:\d\d:\d\dZ)(?=.*?\bdb=(?P<db>\S*))(?=.*?\buser=(?P<user>\S*)).*?LOG:\s (?P<query>.*?);?$

This will then work with a string like

'2022-05-08T04:18:43Z UTC [ user=test db=dev pid=1073922359 userid=100 xid=2063 ]' LOG: alter table my_table alter column string type varchar(16);

Demo on regex101

  • Related