I have a string like this.
'2022-05-08T04:18:43Z UTC [ db=dev user=test pid=1073922359 userid=100 xid=2063 ]' LOG: alter table my_table alter column string type varchar(16);
I want to extract the following items using regex. Tried a couple of pattens, but didnt work.
2022-05-08T04:18:43Z
- Timestamp
dev
- Database name
test
- Database user
alter table my_table alter column string type varchar(16)
- Query
CodePudding user response:
Based on the format you have given, this regex should do what you want.
'(?P<ts>\d{4}-\d\d-\d\dT\d\d:\d\d:\d\dZ).*?\bdb=(?P<db>\S*).*?\buser=(?P<user>\S*).*?LOG:\s (?P<query>.*?);?$
It uses named capture groups to match the timestamp, database, user and query parts of the string. The timestamp is assumed to be in the format in your string i.e. yyyy-mm-ddThh:mm:ssZ
; the database and user fields are any non-space characters after the appropriate tag, and the query is everything from LOG:
until a trailing ;
and end of line.
You can use that in python like this:
import re
str = "'2022-05-08T04:18:43Z UTC [ db=dev user=test pid=1073922359 userid=100 xid=2063 ]' LOG: alter table my_table alter column string type varchar(16);"
m = re.match(r"'(?P<ts>\d{4}-\d\d-\d\dT\d\d:\d\d:\d\dZ).*?\bdb=(?P<db>\S*).*?\buser=(?P<user>\S*).*?LOG:\s (?P<query>.*?);?$", str)
if m is not None:
print(m.groupdict())
Output:
{
'ts': '2022-05-08T04:18:43Z',
'db': 'dev',
'user': 'test',
'query': 'alter table my_table alter column string type varchar(16)'
}
Note this regex assumes the db
tag comes before the user
tag. If this might not be the case, you can workaround that by putting those capture groups inside forward lookaheads e.g.
'(?P<ts>\d{4}-\d\d-\d\dT\d\d:\d\d:\d\dZ)(?=.*?\bdb=(?P<db>\S*))(?=.*?\buser=(?P<user>\S*)).*?LOG:\s (?P<query>.*?);?$
This will then work with a string like
'2022-05-08T04:18:43Z UTC [ user=test db=dev pid=1073922359 userid=100 xid=2063 ]' LOG: alter table my_table alter column string type varchar(16);