I have a strings
below:
CREATE VIEW [dbo].[TestView] AS SELECT T1.Col1,T1.Col2,T2.Col1,T2.Col2,T3.Col1,T3.Col2
FROM table_1 T1 LEFT JOIN table_2 T2 ON T1.Col1 = T2.Col2 INNER JOIN Table_3 T3
ON T1.Col2 = T3.Col2
Objective
I want to group the list of Tables and their column Names. So I want a dataframe like below:
TableName Alias ColumnName
table_1 T1 Col1
table_1 T1 Col2
table_2 T2 Col1
table_2 T2 Col1
table_3 T3 Col1
table_3 T3 Col1
As a first step, I need to use regex
to get the list strings immediately before and after the string T and Number i.e. T1,T2 and T3.
I have tried with :
for i in range(1,4):
part_i = str('T') str(i) '.'
str_i = s.partition(part_i)
str_i_before = str_i[0]
str_i_after = str_i[2].split('AS')[0]
But the output is not correct. I think I need to use regex
here. I have tried below pattern using re.findall()
but no luck
'(. ?) T^([\s\d] )$(. ?)'
Any clue?
CodePudding user response:
Looking at the broader aim, I would approach it as follows:
With a regex, first identify the tables and their aliases, which occur after
FROM
orJOIN
keywords. No need to assume aliases start with "T".Collect this information in a dictionary keyed by the table alias, and with a pair as corresponding data: a table name and an empty set for collecting column names.
Create a search pattern that finds one of the aliases followed by a point and a column name, and add the found columns in the above mentioned sets.
Convert that dictionary into the flat list of (table, alias, column) tuples.
Here is a function that does that:
def getcolumns(s):
tables = {
alias: [table, set()]
for table, alias in re.findall(r"\b(?:FROM|JOIN)\s*(\w )\s*(\w )", s)
}
for alias, column in re.findall(rf"\b({'|'.join(tables.keys())})\.(\w )", s):
tables[alias][1].add(column)
return [
(table, alias, column)
for alias, (table, columns) in tables.items()
for column in columns
]
# The example input:
s = """CREATE VIEW [dbo].[TestView] AS SELECT T1.Col1,T1.Col2,T2.Col1,T2.Col2,T3.Col1,T3.Col2
FROM table_1 T1 LEFT JOIN table_2 T2 ON T1.Col1 = T2.Col2 INNER JOIN Table_3 T3
ON T1.Col2 = T3.Col2"""
lst = getcolumns(s)
lst
will be:
[
('table_1', 'T1', 'Col1'),
('table_1', 'T1', 'Col2'),
('table_2', 'T2', 'Col1'),
('table_2', 'T2', 'Col2'),
('Table_3', 'T3', 'Col1'),
('Table_3', 'T3', 'Col2')
]