Modifying SQL Scripts with SED Command
I would like to explain my task. I have been asked to modify many files with one command without alter the data. The core is to properly insert DATE_B values, since it would give us an error trying to insert this into MYTABLE, due to the fact we create DATE_B as DATE datatype.
My Scripts
Insert into MYTABLE (DATE_A, DATE_B) values (to_timestamp('2017-08-01','YYYY-MM-DD'),'2019-09-09');
Insert into MYTABLE (DATE_A, DATE_B) values (to_timestamp('2018-10-30','YYYY-MM-DD'),'2021-09-09');
Insert into MYTABLE (DATE_A, DATE_B) values (to_timestamp('2022-08-11','YYYY-MM-DD'),'2022-08-06');
I have to modify all DATE_B values in order to insert DATE before the value. What I mean is:
Expected Scritps
Insert into MYTABLE (DATE_A, DATE_B) values (to_timestamp('2017-08-01','YYYY-MM-DD'),DATE'2019-09-09');
Insert into MYTABLE (DATE_A, DATE_B) values (to_timestamp('2018-10-30','YYYY-MM-DD'),DATE'2021-09-09');
Insert into MYTABLE (DATE_A, DATE_B) values (to_timestamp('2022-08-11','YYYY-MM-DD'),DATE'2022-08-06');
My approach was to use REGEX in order to identify the single quote, year, dash, month, dash, day, dash, single quote... But if I do this I will substitute also my properly formatted to_timestamp values.
I found this command from GeekForGeeks useful since we catch the first letter and we enclose them with parenthesis.
$ echo "Welcome To The Geek Stuff" | sed 's/\(\b[A-Z]\)/\(\1\)/g'
Trying to approach my goal:
$ echo "Welcome To The Geek Stuff '2010-10-12' '1999-10-10'" | sed -E 's/(\b[A-Z])/DATE\1/g'
Using regex:
$ echo "Welcome To The Geek Stuff '2010-10-12' '1999-10-10'" | sed -E 's/'('\d{4}-)'/DATE\1/g'
$ echo "Welcome To The Geek Stuff" ,'1999-10-10' | sed 's/^(19|20)\d\d[- /.](0[1-9]|1[012])[- /.](0[1-9]|[12][0-9]|3[01])$/'DATE\1'/g'
I also read about h, H subcommand of sed, but it is not clear enough for me to make it works.
I would appreciate if someone could explain how to properly implement sed h subcommand and how to properly use regex with sed in order to identify COMMA, single quote, year, dash, month, dash, day, dash, single quote. Save this and insert word DATE
Thank you beforehand!
CodePudding user response:
I strongly believe next command properly achieve to identify if a comma is before a single quote and if then is follow by a date like 2019-09-09. In order to insert in between comma and first single quote the word DATE.
sed -i 's/,\'\([0-9]\{4\}\)\-\([0-9][0-9]\)\-\([0-9][0-9]\)\'/,DATE\'\1\-\2\-\3\'/g' test.sql
CodePudding user response:
Assumptions/Understandings:
- the column name isn't important; what's important are strings of the form
'YYYY-MM-DD'
- these strings can occur anywhere in the
values (...)
clause; this means ... - we need to address strings at the beginning of the
values (...)
clause (ie, has a preceding(
) as well as strings elsewhere in thevalues(...)
clause (ie, has a preceding,
) but ... - also need to refrain from adding the
DATE
prefix if the string is embedded within a function (eg,to_timestamp('YYYY-MM-DD')
) - assume all
INSERT
statements are formatted as shown in the sample (eg, a space betweenvalues
and(
; no white space before a'YYYY-MM-DD'
string)
Adding a few scenarios to our input:
$ cat insert.sql
Insert into MYTABLE (DATE_A, DATE_B) values (to_timestamp('2017-08-01','YYYY-MM-DD'),'2019-09-09');
Insert into MYTABLE (DATE_A, DATE_C) values ('2021-09-09',to_timestamp('2018-10-30','YYYY-MM-DD'));
Insert into MYTABLE (DATE_A, MY_INT, DATE_B) values ('2022-08-11','2344','2022-08-06');
One sed
idea:
sed -E "s/( \(|,)('[0-9]{4}-[0-9]{2}-[0-9]{2}')/\1DATE\2/g" insert.sql
Where:
-E
- enable extended regex support (eliminates need to escape parens and braces)"s/..../g"
- wrapsed
script in double quotes to eliminate need to escape single quote within the script( \(|,)
- (1st capture group) space(
or a comma (in this case we need the escape to designate the(
as part of the data)('[0-9]{4}-[0-9]{2}-[0-9]{2}')
- (2nd capture group) our string'YYYY-MM-DD'
\1DATE\2
- inserting theDATE
string between the two capture groups
This generates:
Insert into MYTABLE (DATE_A, DATE_B) values (to_timestamp('2017-08-01','YYYY-MM-DD'),DATE'2019-09-09');
Insert into MYTABLE (DATE_A, DATE_C) values (DATE'2021-09-09',to_timestamp('2018-10-30','YYYY-MM-DD'));
Insert into MYTABLE (DATE_A, MY_INT, DATE_B) values (DATE'2022-08-11','2344',DATE'2022-08-06');
Once satisfied with the results OP can add the -i
flag to have sed
overwrite the input file.