I need help with one of my current tasks wherein i am trying to pick only the table names from the query via Python
So basically lets say a query looks like this
Create table a.dummy_table1
as
select a.dummycolumn1,a.dummycolumn2,a.dummycolumn3 from dual
Now i am passing this query into Python using STRINGIO and then reading only the strings where it starts with "a" and has "_" in it like below
table_list = set(re.findall(r'\ba\.\w ', str(data)))
Here data is the dataframe in which i have parsed the query using StringIO
now in table_list i am getting the below output
a.dummy_table1
a.dummycolumn1
a.dummycolumn2
whereas the Expected output should have been like
a.dummy_table1
<Let me know how we can get this done , have tried the above regular expression but that is not working properly>
Any help on same would be highly appreciated
CodePudding user response:
Your current regex string r"\ba.\w " simply matches any string which:
- Begins with "a" (the "\ba" part)
- Followed by a period (the "." part)
- Followed by 1 or more alphanumeric characters (the "\w " part).
If I've understood your problem correctly, you are looking to extract from str(data) any string fragments which match this pattern instead:
- Begins with "a"
- Followed by a period
- Followed by 1 or more alphanumeric characters
- Followed by an underscore
- Followed by 1 or more alphanumeric characters
Thus, the regular expression should have "_\w " added to the end to match criteria 4 and 5:
table_list = set(re.findall(r"\ba\.\w _\w ", str(data)))