I'm trying to parse an SQL statement (stored in a file) and only save the name of the fields so for the following:
SELECT
Involved_Party_ID=pt.ProgramId,
Involved_Party_Type=2,
First_Name=REPLACE(REPLACE(REPLACE(pt.ProgramName, '|', '/'), CHAR(10), ' '), CHAR(13), ' '),
Registration_Status=pt.RegistrationStatus,
Involved_Party_Status=CONVERT(SMALLINT, pt.ProgramStatus),
Creation_Date=pt.CreationDate,
Incorporation_Country=CONVERT(VARCHAR(50), CASE WHEN pd.IncorporationCountry='UK' THEN 'GB' ELSE pd.IncorporationCountry END),
FROM
SomeTable AS pt
GO
The desired output would be:
Involved_Party_ID
Involved_Party_Type
First_Name
Registration_Status
Involved_Party_Status
Creation_Date
Incorporation_Country
Here's my code:
import re
File1 = open("File1 .sql", "r")
File2 = open("File2 .sql", "w")
for line in File1:
if re.match('\s*SELECT\s*', line):
continue
if re.match('\s*FROM\s*', line):
break
if re.match('(\n).*?(?=(=))', line):
Field_Name = re.search('(\n).*?(?=(=))', line)
File2 .write(Field_Name.group(0) '\n')
I first tried using this regular expression:
'.*?(?=(=))'
But then my result came out as:
Involved_Party_ID
Involved_Party_Type
First_Name
Registration_Status
Involved_Party_Status
Creation_Date
Incorporation_Country
CONVERT(VARCHAR(50), CASE WHEN pd.IncorporationCountry
Now that I've added (\n)
to my regular expression the file returns completely empty although online regular-expression testing sites return the desired outcome.
(I'm not concerned about the whitespaces matching the regexp as I'm only retrieving the first result per line)
CodePudding user response:
Judging by the patterns you use with re.match
, you can do without regex here. Just skip the line that starts with SELECT
, stop matching at the line starting with FROM
and collect the parts of lines between them before the first =
:
File1 = open("File1 .sql", "r")
File2 = open("File2 .sql", "w")
for line in File1:
if line.strip().startswith("SELECT"):
continue
elif line.strip().startswith("FROM"):
break
else:
result = line.strip().split("=", 1)[0]
File2.write(result '\n')
The output I get is
Involved_Party_ID
Involved_Party_Type
First_Name
Registration_Status
Involved_Party_Status
Creation_Date
Incorporation_Country
See this Python demo.
CodePudding user response:
Try (regex101):
s = """\
SELECT
Involved_Party_ID=pt.ProgramId,
Involved_Party_Type=2,
First_Name=REPLACE(REPLACE(REPLACE(pt.ProgramName, '|', '/'), CHAR(10), ' '), CHAR(13), ' '),
Registration_Status=pt.RegistrationStatus,
Involved_Party_Status=CONVERT(SMALLINT, pt.ProgramStatus),
Creation_Date=pt.CreationDate,
Incorporation_Country=CONVERT(VARCHAR(50), CASE WHEN pd.IncorporationCountry='UK' THEN 'GB' ELSE pd.IncorporationCountry END),
FROM
SomeTable AS pt
GO"""
for v in re.findall(r"^([^=\s] )=", s, flags=re.M):
print(v)
Prints:
Involved_Party_ID
Involved_Party_Type
First_Name
Registration_Status
Involved_Party_Status
Creation_Date
Incorporation_Country