I have a file containing directory entries in the following format:
<item><ln></ln><fn>Some person</fn><ct>07123456789</ct><sd>37</sd><rt>1</rt><bw>1</bw></item>
I would like to use sed
to search for any where <ct>
is an 11 digit number and where <bw>1</bw>
. I would like to change the line above like so:
<item><ln></ln><fn>Some person</fn><ct>07123456789</ct><sd>37</sd><rt>1</rt><bw>0</bw></item>
(if it isn't obvious I have changed <bw>
= 0)
I have tried the following in sed
but it does not match:
sed -E 's/(. <ct>\d{11}. <bw>)1(<\/bw><\/item>)/\10\2/g' test-directory.xml
What am I doing wrong?
CodePudding user response:
You may use this sed
with 2 capture groups:
sed -E 's~(.*<ct>[0-9]{11}</ct>.*<bw>)1(</bw>.*)~\10\2~' file
<item><ln></ln><fn>Some person</fn><ct>07123456789</ct><sd>37</sd><rt>1</rt><bw>0</bw></item>
More Info:
(.*<ct>[0-9]{11}</ct>.*<bw>)
: Match and capture any text followed by<ct>11-digits</ct>
followed by any text followed by<bw>
in capture group #11
:(</bw>.*)
: Match</bw>
followed by anything in capture group #2
PS: This assumes <ct>
tag appears before <bw>
tag in same line. For more refined control over XML better to use a XML parser instead of shell utilities.
If <bw>
tag position is not fixed then you may use this sed
solution:
sed -E '\~<ct>[0-9]{11}</ct>~ s~(.*<bw>)1(</bw>.*)~\10\2~' file
CodePudding user response:
With awk
(in case you are ok with it) you could try following GNU awk
solution, written and tested in GNU awk
with shown samples. Simple explanation would be, using match
function of awk
program where using regex (.*<ct>[0-9]{11}<\/ct>.*<bw>)([0-9] )(<\/bw>.*)
which creates 3 capturing group in it(to be used later on) and stores values of those as per capturing group number it will create index of items in array named arr
. Once its done then printing only required part(changing any digits with 0 which is coming before </bw>
).
awk '
match($0,/(.*<ct>[0-9]{11}<\/ct>.*<bw>)([0-9] )(<\/bw>.*)/,arr){
print arr[1]"0"arr[3]
}
' Input_file
Here is the Online demo for above shown regex.