Use grep xargs sed to regenerate UUIDs in a file more effeciently-CodePudding

I successfully am able to replace UUIDs with freshly generated UUIDDs in a file:

FILE=/home/username/sql_inserts_with_uuid.sql
grep -i -o -E "([a-f0-9]{8}-[a-f0-9]{4}-4[a-f0-9]{3}-[89aAbB][a-f0-9]{3}-[a-f0-9]{12})" $FILE | xargs -I {} sed -i "s/{}/`uuidgen -t`/g" $FILE

But its slow because it rewrites the file for each UUID it generates. Is there a more efficient way to rewrite every UUID in a single pass instead of rewriting the same file over and over?

Save this sample data in a file to test:

INSERT INTO fake_table (uuid) VALUES ('812ab76e-43ca-11ec-b54f-00d8617c2296');
INSERT INTO fake_table (uuid) VALUES ('854f7b36-43ca-11ec-9608-00d8617c2296');
INSERT INTO fake_table (uuid) VALUES ('8a09444a-43ca-11ec-8ae2-00d8617c2296');
INSERT INTO fake_table (uuid) VALUES ('8cd0da58-43ca-11ec-9811-00d8617c2296');
INSERT INTO fake_table (uuid) VALUES ('8f9889c0-43ca-11ec-8bfc-00d8617c2296');

CodePudding user response：

You can use awk with a system call to replace them all in one pass:

awk '
BEGIN{pat="[a-fA-F0-9]{8}-[a-fA-F0-9]{4}-[0-9][a-fA-F0-9]{3}-[89aAbB][a-fA-F0-9]{3}-[a-fA-F0-9]{12}"}
function get_uuid(){
    cmd = "uuidgen"
    cmd | getline uuid
    close(cmd)
    return uuid
}

$0~pat{     
    uuid=get_uuid()
    sub(pat,uuid,$0)
} 1
' file.txt

Prints:

INSERT INTO fake_table (uuid) VALUES ('473C4331-CC31-4FD0-AE99-37FA7E5F23CF');
INSERT INTO fake_table (uuid) VALUES ('EBEC05AB-4236-4384-AF7A-76D4A0615599');
INSERT INTO fake_table (uuid) VALUES ('23740143-6CC1-41FC-8AE7-038810291026');
INSERT INTO fake_table (uuid) VALUES ('7DBF25AF-4E85-4C55-B8CA-0F6150D5DD3C');
INSERT INTO fake_table (uuid) VALUES ('4365127B-EB46-414E-92D4-B48CC211489E');

With GNU awk, you can make the replacements inplace. Otherwise, you need to redirect the output of this to a temp file then mv the temp file on top of the source file. This sounds harder than is actually is.

Speed test: Multiplying your example file to 10,000 UUID replacements, the file is processed in 21 seconds on my computer and 26 ms if the same file has no replacements. The system call is not free in terms of efficiency but this is likely faster than what you are doing...

CodePudding user response：

In plain bash:

cat new_uuids

#!/bin/bash

hex='[[:xdigit:]]'
hex3="$hex$hex$hex"
hex4="$hex3$hex"
hex8="$hex4$hex4"
hex12="$hex8$hex4"
pat="$hex8-$hex4-[0-9]$hex3-[89aAbB]$hex3-$hex12"

while IFS= read -r line; do
    if [[ $line = *$pat* ]]; then
        echo "${line/$pat/$(uuidgen -t)}"
    else
        echo "$line"
    fi
done

Call it as

./new_uuids < sql_inserts_with_uuid.sql > new_sql_inserts_with_uuid.sql

CodePudding user response：

With pure Ruby (no external dependencies) you can do it really fast, but your UUIDs will be random-based instead of time-based.
_{If you have Ruby available (or easily available) in your computer then give it a shot}

filepath=/home/username/sql_inserts_with_uuid.sql

ruby -e '
  require "securerandom"
  File.foreach(ARGV[0]) do |line|
    print line.gsub(/\h{8}-\h{4}-\h{4}-\h{4}-\h{12}/, SecureRandom.uuid)
  end
' "$filepath"

Output:

INSERT INTO fake_table (uuid) VALUES ('da29dd7e-ddb1-40e5-92ec-ae9b67ecc3ac');
INSERT INTO fake_table (uuid) VALUES ('ab70e223-ea84-4c31-883a-203f6d0afba7');
INSERT INTO fake_table (uuid) VALUES ('078940f7-6784-4853-8728-06b2cc6aaa8b');
INSERT INTO fake_table (uuid) VALUES ('28f15746-08cf-4b64-84c6-53123f4a345b');
...

Timing of different solutions with 10 000 lines of SQL to replace

my ruby solution 0m0.221s
@M.NejatAydin bash solution 0m10.050s
@dawg's awk solution 0m19.345s
@OP grep|xarg sed solution 3m59.707s