Home > OS >  Finding the words immediate before and after a string dynamically specified
Finding the words immediate before and after a string dynamically specified

Time:12-01

I have a strings below:

CREATE VIEW [dbo].[TestView] AS SELECT T1.Col1,T1.Col2,T2.Col1,T2.Col2,T3.Col1,T3.Col2
FROM table_1 T1 LEFT JOIN table_2 T2 ON T1.Col1 = T2.Col2 INNER JOIN Table_3 T3 
ON T1.Col2 = T3.Col2

Objective

I want to group the list of Tables and their column Names. So I want a dataframe like below:

TableName    Alias     ColumnName
  table_1     T1         Col1
  table_1     T1         Col2
  table_2     T2         Col1
  table_2     T2         Col1
  table_3     T3         Col1
  table_3     T3         Col1

As a first step, I need to use regex to get the list strings immediately before and after the string T and Number i.e. T1,T2 and T3.

I have tried with :

for i in range(1,4):
  part_i = str('T') str(i) '.'
  str_i = s.partition(part_i)
  str_i_before = str_i[0]
  str_i_after = str_i[2].split('AS')[0]

But the output is not correct. I think I need to use regex here. I have tried below pattern using re.findall() but no luck

'(. ?) T^([\s\d] )$(. ?)'

Any clue?

CodePudding user response:

Looking at the broader aim, I would approach it as follows:

  • With a regex, first identify the tables and their aliases, which occur after FROM or JOIN keywords. No need to assume aliases start with "T".

  • Collect this information in a dictionary keyed by the table alias, and with a pair as corresponding data: a table name and an empty set for collecting column names.

  • Create a search pattern that finds one of the aliases followed by a point and a column name, and add the found columns in the above mentioned sets.

  • Convert that dictionary into the flat list of (table, alias, column) tuples.

Here is a function that does that:

def getcolumns(s):
    tables = {
        alias: [table, set()]
        for table, alias in re.findall(r"\b(?:FROM|JOIN)\s*(\w )\s*(\w )", s)
    }
    
    for alias, column in re.findall(rf"\b({'|'.join(tables.keys())})\.(\w )", s):
        tables[alias][1].add(column)
    
    return [
        (table, alias, column)
        for alias, (table, columns) in tables.items()
        for column in columns
    ]


# The example input:
s = """CREATE VIEW [dbo].[TestView] AS SELECT T1.Col1,T1.Col2,T2.Col1,T2.Col2,T3.Col1,T3.Col2
FROM table_1 T1 LEFT JOIN table_2 T2 ON T1.Col1 = T2.Col2 INNER JOIN Table_3 T3 
ON T1.Col2 = T3.Col2"""

lst = getcolumns(s)

lst will be:

[
    ('table_1', 'T1', 'Col1'), 
    ('table_1', 'T1', 'Col2'),
    ('table_2', 'T2', 'Col1'),
    ('table_2', 'T2', 'Col2'),
    ('Table_3', 'T3', 'Col1'),
    ('Table_3', 'T3', 'Col2')
]
  • Related