Extracting a specific length substring using SED-CodePudding

I have the following SED command

echo "abcd_2222222233333333_jdkj" | sed -e 's/^\(.*\)_\(.*\)_\(.*\)$/\2_\1_\3/'

that returns

2222222233333333_abcd_jdkj

That's great, but I really want

22222222-33333333_abcd_jdkj

Is this possible with an easy tweak or do I need some non-sed solution? Basically, I know the number is 16 bytes, but I need to break it into two 8 byte numbers.

CodePudding user response：

Instead of .* to match any number of characters, you can use .{8} to match exactly eight characters.

The below also uses sed -r to allow ERE syntax, which requires fewer backslashes and is generally easier to read than the default BRE. (On systems with BSD-style tools, this might be sed -E instead).

sed -re 's/^(.*)_(.{8})(.*)_(.*)$/\2-\3_\1_\4/' <<<"abcd_2222222233333333_jdkj"

By the way -- I would strongly suggest using [^_]* instead of .* so your regex can't match underscores where you don't want it to. (. means "any character"; [^_] means "any character except _"). That's not just a correctness enhancement -- it can also make your regex faster to evaluate by avoiding backtracking (where the regex engine realizes it's matched too much content and needs to undo some of its prior matches).

Also consider bash's built-in regex support:

string='abcd_2222222233333333_jdkj'
re='([^_] )_([[:digit:]]{8})([[:digit:]] )_(.*)'

if [[ $string =~ $re ]]; then
  result=${BASH_REMATCH[2]}-${BASH_REMATCH[3]}_${BASH_REMATCH[1]}_${BASH_REMATCH[4]}
  echo "Result is: $result"
else
  echo "No match found"
fi

CodePudding user response：

Solution per the above commenter's tip works

echo "abcd_2222222233333333_jdkj" | sed -e 's/^\(.*\)_\(.\{8\}\)\(.\{8\}\)_\(.*\)$/\2-\3_\1_\4/'