I have a string like the following:
sql = """DROP TABLE IF EXISTS table1;
ALTER TABLE table1 DROP PRIMARY KEY;
INSERT INTO table1 (id, created, name, telefonnummer, erPatient_id) VALUES
(1, '2015-08-06 12;09:08', ' ', ' ', 16528),
(2, '2015-08-06 12:43:11', ' ', ' ', 16529)
;
INSERT INTO table2 (comment, id) VALUES
('hello this is a semicolon ;', 2);"""
And I would like to get the statement INSERT INTO table1:
INSERT INTO table1 (id, created, name, telefonnummer, erPatient_id) VALUES
(1, '2015-08-06 12;09:08', ' ', ' ', 16528),
(2, '2015-08-06 12:43:11', ' ', ' ', 16529)
;
I can't split the string with sql.split(';)
because there are semicolons within the VALUES to insert.
I have tried with regex with no success:
import re
pattern_string = r"INSERT INTO bezugsperson[(]*[^)] \)[^)]"
q = re.findall(pattern_string, data, re.MULTILINE | re.DOTALL)
In the real string there will be thousands of values to insert and dozens of tables.
CodePudding user response:
Use
import re
pattern_string = r"\bINSERT INTO \w \s.*?;\s*$"
q = re.findall(pattern_string, data, re.MULTILINE | re.DOTALL)
See regex proof.
EXPLANATION
NODE EXPLANATION
--------------------------------------------------------------------------------
\b the boundary between a word char (\w) and
something that is not a word char
--------------------------------------------------------------------------------
INSERT INTO 'INSERT INTO '
--------------------------------------------------------------------------------
\w word characters 1 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
\s whitespace (\n, \r, \t, \f, and " ")
--------------------------------------------------------------------------------
.*? any character (0 or more times
(matching the least amount possible))
--------------------------------------------------------------------------------
; ';'
--------------------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
$ end of a line
CodePudding user response:
Regular expression to capture everything between INSERT INTO table1
and \n;
:
pattern_string = "INSERT INTO table1 .*\n;"
q = re.findall(pattern_string, sql, re.DOTALL)