I have a string with 3 capture groups and would like to preserve the first and third but perform a substitution on the second. How do I express this in sed?
Concretely, I have an input string like:
top-level.subpath.one.subpath.two.subpath.forty-five
And I want to preserve the part before the first .
, shorten the middle part to the first letter of every word, and preserve the part after the last .
. The result should look like:
top-level.s.o.s.t.s.forty-five
For preserving the capture groups, I have:
sed -r 's/([^.]*)(.*)(\..*)/\1...\3/'
which gets me:
top-level....forty-five
For converting something like .subpath.one.subpath.two.subpath
to only initials, I have:
sed -r 's/(\.[^.])[^\.]*/\1/g'
which gets me:
.s.o.s.t.s
I'd like to essentially apply that second sed expression to capture group 2. Is there some way I can chain sed substitutions to perform that second substitution on only the second capture group while retaining the first and third?
CodePudding user response:
You can use
sed -E ':a; s/^(.*\.[^.])[^.] (\.)/\1\2/; ta' file > newfile # GNU sed
sed -E -e :a -e 's/^(.*\.[^.])[^.] (\.)/\1\2/' -e ta file > newfile # FreeBSD sed
See the online demo. Details:
-E
- enables POSIX ERE syntax ((...)
is parsed as a grouping construct):a
- sets ana
labels/^(.*\.[^.])[^.] (\.)/\1\2/
- finds zero or more chars, a.
and then any single char other than a.
(capturing this into Group 1), then one or more chars other than a.
, and then matches and captures into Group 2 a dot char, the match is replaced with concatenated Group 1 Group 2 valuesta
- goes to thea
label upon successful replacement.
CodePudding user response:
A simple awk
solution that will work with any version of awk including MacOS:
s='top-level.subpath.one.subpath.two.subpath.forty-five'
awk 'BEGIN{FS=OFS="."} {for(i=2;i<NF; i) $i=substr($i,1,1)}1' <<< "$s"
top-level.s.o.s.t.s.forty-five
This awk
command uses .
as input and output field separator. We loop through field position 2
to last-1
and replace value of each field with the first character of that field. In the end we print full record.
A BSD sed
solution to do the same:
sed -E -e ':x' -e 's/(. \..)[^.] \./\1./; tx' <<< "$s"
top-level.s.o.s.t.s.forty-five
CodePudding user response:
This might work for you (GNU sed):
sed -E ':a;s/(\..*)\B.(.*\.)/\1\2/;ta' file
Capture the first and last periods and hollow out the middle removing any side-by-side word characters.
Ameliorating @anubhava's sed answer:
sed -E 's/(\..)[^.] \./\1./g;s//\1./g' file
Using the global flag and repeating the same substitution provides a 2 command solution.