Home > Mobile >  Need guidance with Regular Expression in Python
Need guidance with Regular Expression in Python

Time:09-23

I need help with one of my current tasks wherein i am trying to pick only the table names from the query via Python

So basically lets say a query looks like this

Create table a.dummy_table1
as
select a.dummycolumn1,a.dummycolumn2,a.dummycolumn3 from dual

Now i am passing this query into Python using STRINGIO and then reading only the strings where it starts with "a" and has "_" in it like below
table_list = set(re.findall(r'\ba\.\w ', str(data)))

Here data is the dataframe in which i have parsed the query using StringIO

now in table_list i am getting the below output

a.dummy_table1
a.dummycolumn1
a.dummycolumn2


whereas the Expected output should have been like 
a.dummy_table1

<Let me know how we can get this done , have tried the above regular expression but that is not working properly>

Any help on same would be highly appreciated

CodePudding user response:

Your current regex string r"\ba.\w " simply matches any string which:

  1. Begins with "a" (the "\ba" part)
  2. Followed by a period (the "." part)
  3. Followed by 1 or more alphanumeric characters (the "\w " part).

If I've understood your problem correctly, you are looking to extract from str(data) any string fragments which match this pattern instead:

  1. Begins with "a"
  2. Followed by a period
  3. Followed by 1 or more alphanumeric characters
  4. Followed by an underscore
  5. Followed by 1 or more alphanumeric characters

Thus, the regular expression should have "_\w " added to the end to match criteria 4 and 5:

table_list = set(re.findall(r"\ba\.\w _\w ", str(data)))
  • Related