Home > Back-end >  Regex changes to a DDL (Java)
Regex changes to a DDL (Java)

Time:06-30

I have a process that gets a DDL from Impala and makes a few changes for it to work on SQL Server.

I get something like this from Impala

CREATE EXTERNAL TABLE xxx.yyy (
year INT,
day INT,
mmm_yyyy DATE,
2target_revenue_day DECIMAL(38,6),
2budget_day DECIMAL(38,6),
last_6_months STRING,
load_timestamp TIMESTAMP
)
STORED AS PARQUET
LOCATION 's3a://xxx'
TBLPROPERTIES ('')

I managed to remove the "EXTERNAL TABLE" bit as I only need "TABLE", changed "STRING" to "VARCHAR" and "TIMESTAMP" to "DATETIME2". Also removed the bit at the bottom, i.e STORED AS PARQUET LOCATION 's3a://xxx' TBLPROPERTIES ('')

My problem is, some of the column names like year, day and 2target_revenue_day I am going to need to wrap in quotes otherwise script won't work (reserved words, name starts with a digit). I need to find a way to either wrap all column names in quotes or just the ones which are reserved words and start with a digit.

Any idea how to go about it?

Thank you

CodePudding user response:

You could key the pattern off of a word immediately preceding one of a set of known data types. Depending on when you perform that step, you'll need to customize that list to match either the Impala or the SQL Server types.

(\w )\s (?:BOOLEAN|CHAR|DATE|DECIMAL|DOUBLE|FLOAT|INT|REAL|STRING|TIMESTAMP|VARCHAR|etc)

CodePudding user response:

With regards to columns start with a digit, this has worked for me:

variable.replaceAll("(\\d{1}[a-z] [a-z0-9_]*)", "\"$0\"");

It finds anything with a number in the beginning of the column name and wraps it in quotes.

With regards to reserved words, I've had to manually look for words like year, month, day, date, etc. and replace them a quoted name, e.g "year", "month", etc.

variable.replace(" date ", " \"date\" ").replace(" year ", " \"year\" ").replace(" month ", " \"month\" ").replace(" day ", " \"day\" ");

I hope someone will find this useful.

  • Related