Home > Mobile >  GNU sed regex to fix mySQL db inserts for SQLite
GNU sed regex to fix mySQL db inserts for SQLite

Time:01-03

I am trying to translate a huge mySQL database dump file from mySQL syntax into SQLite syntax.

At https://regex101.com/ I have successfully created a ECMAScript flavor regex to turn something like:

,'foo\'s bar!',

into:

,"foo\'s bar!"

with this regular expression:

/,'([^'] )\\'([^'] )',/"$1\\'$2"/g

testing against this short file:

(1058,'gpl5q0x51349lmdq3e0ijm4k9b6n','Henry\'s_1.csv','text/csv','{\"identified\":true,\"analyzed\":true}',33854,'mUVk0/XGX afIpkrqBm7LQ==','2021-01-06 03:07:23'),
(1059,'xzj8mivsenkakkrurfjytxjsaj1h','Henry\'s_2.csv','text/csv','{\"identified\":true,\"analyzed\":true}',33555,'KfRYqfAWtSIYXZ6oQZyYbA==','2021-01-06 03:07:23'),

Resulting in:

(1058,'gpl5q0x51349lmdq3e0ijm4k9b6n'"Henry\'s_1.csv"'text/csv','{\"identified\":true,\"analyzed\":true}',33854,'mUVk0/XGX afIpkrqBm7LQ==','2021-01-06 03:07:23'),
(1059,'xzj8mivsenkakkrurfjytxjsaj1h'"Henry\'s_2.csv"'text/csv','{\"identified\":true,\"analyzed\":true}',33555,'KfRYqfAWtSIYXZ6oQZyYbA==','2021-01-06 03:07:23'),

but for the life of me I cannot translate this into a GNU sed flavor regex. For example, this command does not make any substitutions in the output:

sed -r s/,'([^'] )\\'([^'] )',/"$1\\'$2"/g <test.sql
...
sed -r s/,'([^'] )\\'([^'] )',/"\1\\'\2"/g <test.sql: doesn't work either.

I have looked for a regex tool online that translates between different flavors of regex but cannot find one that works on GNU sed (shipped with GIT: sed (GNU sed) 4.8). PCRE seems to be close to what sed has but that doesn't work. I tried perl as well, no luck. Anyone know a regex expression that works or a translator tool that works? I am just about ready to write a nodejs program to do this for me.

Also, for extra credit, how can I write a sed script to handle any number of escaped quotes within a quoted string? I have that issue to deal with as well in my DB dump file.

Examples:

'foo\'-bar'  // on instance
'foo\'and\'bar'  // two instances
'foo\'and\'bar\'s on the deck'  // three instances
and so on...

Thanks!

CodePudding user response:

You can use

sed -E "s/,'([^'] )\\\\'([^'] )',/"'"'"\\1\\\\'\\2"'"'/g test.sql

The "s/,'([^'] )\\\\'([^'] )',/"'"'"\\1\\\\'\\2"'"'/g consists of

  • "s/,'([^'] )\\\\'([^'] )',/" - a s/,'([^'] )\\'([^'] )',/ part (inside double quotes, so backslashes need doubling)
  • '"' - a " char (inside single quotes)
  • "\\1\\\\'\\2" - \1\\'\2 pattern (inside double quotes, so backslashes are doubled)
  • '"' - a " char (inside single quotes)
  • /g - the global flag (no need quoting here).

CodePudding user response:

First look at your command

sed -r s/,'([^'] )\\'([^'] )',/"\1\\'\2"/g test.sql

I prefer writing the whole sed command in single quotes. When you need a single quote, you must close the string ('), use an escaped single quote (\') and open the next string with a ', all joined: '\''.
I also added two , characters.

sed -r 's/,'\''([^'\''] )\\'\''([^'\''] )'\'',/,"\1\\'\''\2",/g' test.sql
# Shorter
sed -r 's/,'\''([^'\''] \\'\''[^'\''] )'\'',/,"\1",/g' test.sql
# Using another way to write the single quotes, with the hex notation
sed -r 's/,\x27([^\x27] \\\x27[^\x27] )\x27,/,"\1",/g' test.sql

This works for simple cases, not for 'foo\'and\'bar\'s on the deck'.

I think you want to replace the quotes in the simple fields too.
Suppose you want to transform

(1058,'gpl5q0x51349lmdq3e0ijm4k9b6n','Henry\'s_1.csv','text/csv','{\"identified\":true,\"analyzed\":true}',33854,'mUVk0/XGX afIpkrqBm7LQ==','2021-01-06 03:07:23'),
(1059,'xzj8mivsenkakkrurfjytxjsaj1h','Henry\'s_2.csv','text/csv','{\"identified\":true,\"analyzed\":true}',33555,'KfRYqfAWtSIYXZ6oQZyYbA==','2021-01-06 03:07:23'),
(2000,'extra credit from question','foo\'and\'bar\'s on the deck','text/csv','{\"identified\":true,\"analyzed\":true}',33999,'KgSBFstbdthdsssssstvbA==','2022-01-02 13:07:23'),

into

(1058,"gpl5q0x51349lmdq3e0ijm4k9b6n","Henry\'s_1.csv","text/csv","{\"identified\":true,\"analyzed\":true}",33854,"mUVk0/XGX afIpkrqBm7LQ==","2021-01-06 03:07:23"),
(1059,"xzj8mivsenkakkrurfjytxjsaj1h","Henry\'s_2.csv","text/csv","{\"identified\":true,\"analyzed\":true}",33555,"KfRYqfAWtSIYXZ6oQZyYbA==","2021-01-06 03:07:23"),
(2000,"extra credit from question","foo\'and\'bar\'s on the deck","text/csv","{\"identified\":true,\"analyzed\":true}",33999,"KgSBFstbdthdsssssstvbA==","2022-01-02 13:07:23"),

In this answer I don't use the '\'' but the hexadecimal notation \x27.
First "backup" the \' combinations (replace them by an unused character like \r), replace all normal quotes by double quotes and "restore the backup" (change back the \r).

sed 's/\\\x27/\r/g; s/\x27/"/g; s/\r/\\\x27/g' test.sql
# or hex value for double quote "
sed 's/\\\x27/\r/g; s/\x27/\x22/g; s/\r/\\\x27/g' test.sql
  • Related