Input:
x.y={aaa b .c}
Note that the the content within {}
are only an example, in reality it could be any value.
Problem: I would like to keep only the alphanumeric characters within the {}
.
So it would be come:
x.y={aaabbc}
Trial 0
$ echo 'x.y={aaa b .c}' | sed 's/[^[:alnum:]]\ //g'
xyaaabc
This is great, but I'd like to only modify the part within {}
. So I thought this may need capture groups, hence I went ahead and tried these:
Trial 1
$ echo 'x.y={aaa b .c}' | sed -E 's/x.y=\{(.*)\}/x.y={\1}/'
x.y={aaa b .c}
Here I have captured the content I want to modify (aaa b .c
) correctly, but I need a way to somehow do s/[^[:alnum:]]\ //g
only on \1
.
Instead, I tried capturing all alphanumeric characters only (to \1
) like this:
Trial 2
$ echo 'x.y={aaa b .c}' | sed -E 's/x.y=\{([[:alnum:]] )\}/x.y={\1}/'
x.y={aaa b .c}
Of course, it doesn't work because I'm only expecting alnum's and then immediately a }
literal. I didn't tell it to ignore the non-alnum's. I.e, this part:
s/x.y=\{([[:alnum:]] )\}/x.y={\1}/
^^^^^^^^^^^^^^^^^^
It literally matches: an open brace, some alnum's, and a closing brace -- which is not what I want. I'd like it to match everything, but only capture the alnum's.
Example of input/output:
x.y={aaa b .c} blah
blah
x.y={1 2 3 def} blah
blah
to
x.y={aaabc} blah
blah
x.y={123def} blah
blah
I searched the web before finally giving up and posting the question but I didn't find anything helpful as I didn't see anyone with a similar problem as mine. Would appreciate some help this as I'd love to have a better understanding of variables in regex/sed, thanks!
CodePudding user response:
With your shown samples, please try following in awk
. Written and tested in GNU awk
.
awk '
match($0,/\{[^}]*}/){
val=substr($0,RSTART,RLENGTH)
gsub(/[^{}a-zA-Z]/,"",val)
$0=substr($0,1,RSTART-1) val substr($0,RSTART RLENGTH)
}
1
' Input_file
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
match($0,/\{[^}]*}/){ ##using match function of awk to match from { to first occurrence of }
val=substr($0,RSTART,RLENGTH) ##Creating val which has sub string of matched regex in it.
gsub(/[^{}a-zA-Z]/,"",val) ##Globally substituting everything apart from { } and alphabets in val.
$0=substr($0,1,RSTART-1) val substr($0,RSTART RLENGTH) ##saving everything before match val and everything after match here.
}
1 ##Printing line if it doesn't meet `match` condition mentioned above.
' Input_file ##Mentioning Input_file name here.
Generic solution: In case you have multiple occurrences of { and } then try following awk
code.
awk '
{
line=""
while(match($0,/\{[^}]*}/)){
val=substr($0,RSTART,RLENGTH)
gsub(/[^{}a-zA-Z]/,"",val)
line=(line?line:"") (substr($0,1,RSTART-1) val)
$0=substr($0,RSTART RLENGTH)
}
if(RSTART RLENGTH!=length($0)){
$0=line $0
}
else{
$0=line
}
}
1
' Input_file
CodePudding user response:
With sed
(tested on GNU sed
, syntax may vary for other implementations):
$ sed -E ':a s/(\{[[:alnum:]]*)[^[:alnum:]] ([^}]*})/\1\2/; ta' ip.txt
x.y={aaabc} blah
blah
x.y={123def} blah
blah
:a
marks that location as labela
(used to jump usingta
as long as the substitution succeeds)(\{[[:alnum:]]*)
matches{
followed by zero or more alnum characaters[^[:alnum:]]
matches one or more non-alnum characters([^}]*})
matches till the next}
character
If perl
is okay:
$ perl -pe 's/\{\K[^}] (?=\})/$&=~s|[^a-z\d] ||gir/e' ip.txt
x.y={aaabc} blah
blah
x.y={123def} blah
blah
\{\K[^}] (?=\})
match sequence of{
to}
(assuming}
cannot occur in between)\{\K
and(?=\})
are used to avoid the braces from being part of the matched portion
e
flag allows you to use Perl code in replacement portion, in this case another substitute command$&=~s|[^a-z\d] ||gir
here,$&
refers to entire matched portion,gi
flags are used for global/case-insensitive andr
flag is used to return the value of this substitution instead of modifying$&
[^a-z\d]
matches non-alphanumeric characters (assuming ASCII, you can also use[^[:alnum:]]
)- use
\W
if you want to preserve underscores as well
For both solutions, you can add x\.y=
prefix if needed to narrow the scope of matching.
CodePudding user response:
Here is another gnu-awk
solution using FPAT
:
s='x.y={aaa b .c}'
awk -v OFS= -v FPAT='{[^}] }|[^{}] ' '
{
for (i=1; i<=NF; i)
if ($i ~ /^{/) $i = "{" gensub(/[^[:alnum:]] /, "", "g", $i) "}"
} 1' <<< "$s"
x.y={aaabc}