In my code I have views defined like below.
VIEW Company_Person_Sd IS
Prompt = 'Company'
Company.Prompt = 'Company ID'
SELECT company_id company_id,
emp_no emp_no,
Get_Person(company_id, emp_no) person_id,
cp.rowid objid,
to_char(cp.rowversion) objversion,
rowkey objkey
FROM companies cp;
There can be more than one view defined in a single file (usually there are 20 or more).
I want to get the whole view using a regex in python.
I did the same thing to a method like below, using the following regex.(and it worked fine)
methodRegex = r"^\s*((FUNCTION|PROCEDURE)\s (\w ))(.*?)BEGIN(.*?)^END\s*(\w );"
methodMatches = re.finditer(methodRegex, fContent, re.DOTALL | re.MULTILINE | re.IGNORECASE | re.VERBOSE)
for methodMatchNum, methodMatch in enumerate(methodMatches, start=1):
methodContent=methodMatch.group()
methodNameFull=methodMatch.group(1)
methodType=methodMatch.group(2)
methodName=methodMatch.group(3)
method example
PROCEDURE Prepare___ (
attr_ IN OUT VARCHAR2 )
IS
----
BEGIN
--
END Prepare___;
PROCEDURE Insert___ (
attr_ IN OUT VARCHAR2 )
IS
----
BEGIN
--
END Insert___;
When I try to do the same for views, it gives the wrong output. Actually I couldn't find how to catch the end of the view. I tried with semicolon as well, which gave a wrong output.
My regex for views
viewRegex = r"^\s*(VIEW\s (\w ))(.*?)SELECT(.*?)^FROM\s*(\w );"
Please help me find out where I'm doing it wrong. Thanks in advance.
CodePudding user response:
You don't get any match with viewRegex
because it only matches when there are only word characters ([a-zA-Z0-9_]
) between FROM
and ;
. Whereas your example also includes a whitespace. So take whitespaces into account as well:
viewRegex = r"^\s*(VIEW\s (\w ))(.*?)SELECT(.*?)^FROM\s*([\w\s] );"
CodePudding user response:
If you have a lot of views in a single file, another option is to prevent using .*?
with re.DOTALL
to prevent unnecessary backtracking.
Instead, you can match the parts from VIEW
to SELECT
to FROM
checking that what is in between is not another one of the key words to prevent matching too much using a negative lookahead (Assuming these can not occur in between)
For the last part after FROM, you can match word characters, optionally repeated by whitespace chars and again word characters.
^(VIEW\s (\w ))(.*(?:\n(?!SELECT|VIEW|FROM).*)*)\nSELECT\s (.*(?:\n(?!SELECT|VIEW|FROM).*)*)\nFROM\s (\w (?:\s \w ));
The pattern matches:
^
Start of string(VIEW\s (\w ))
Capture group for VIEW followed by a group for the word characters(.*(?:\n(?!SELECT|VIEW|FROM).*)*)
Capture group matching the rest of the lines, and all lines that do not start with a keyword\nSELECT\s
Match a newline, SELECT and 1 whitespace cahrs(.*(?:\n(?!SELECT|VIEW|FROM).*)*)
Capture group matching the rest of the lines, and all lines that do not start with a keyword\nFROM\s
Match a newline, FROM and 1 whitespace chars(\w (?:\s \w ))
; Capture group for the value of FROM, matching 1 word characters and optionally repeated by whitespace chars and word characters
For example (You can omit the re.VERBOSE
and re.DOTALL
)
import re
methodRegex = r"^^(VIEW\s (\w ))(.*(?:\n(?!SELECT|VIEW|FROM).*)*)\nSELECT\s (.*(?:\n(?!SELECT|VIEW|FROM).*)*)\nFROM\s (\w (?:\s \w ));"
fContent = ("VIEW Company_Person_Sd IS\n"
" Prompt = 'Company'\n"
" Company.Prompt = 'Company ID'\n"
"SELECT company_id company_id,\n"
" emp_no emp_no,\n"
" Get_Person(company_id, emp_no) person_id,\n"
" cp.rowid objid,\n"
" to_char(cp.rowversion) objversion,\n"
" rowkey objkey\n"
"FROM companies cp;")
methodMatches = re.finditer(methodRegex, fContent, re.MULTILINE | re.IGNORECASE)
for methodMatchNum, methodMatch in enumerate(methodMatches, start=1):
methodContent = methodMatch.group()
methodNameFull = methodMatch.group(1)
methodType = methodMatch.group(2)
methodName = methodMatch.group(3)