First of all, there might be other (better) options, but I'm bound to sed of awk in this case. I have an XML file with the following contents.
<Field name="field1" type="String">AAAA</Field>
<Field name="field2" type="Integer">0</Field>
<Field name="field4" type="String">BBBB</Field>
Here I would like to change the contents using sed, to get the following result:
<field1>AAAA</field1>
<field2>0</field2>
<field4>BBBB</field4>
So remove the "*Field name="*"
, the last quote from the name and the rest of the attributes up till the *>*
and also I would like to change the last </Field>
with the actual field name.
How to approach with awk
or sed
?
Removing from the first tag works with
sed 's/ type=".*"//'
and
sed 's/Field name="//'
I'm not sure how to proceed with the replacing of the last one.
CodePudding user response:
Using sed
$ sed -E 's~[A-Z][^"]*"([^"]*)[^>]*([^/]*/)[^>]*~\1\2\1~' input_file
<field1>AAAA</field1>
<field2>0</field2>
<field4>BBBB</field4>
CodePudding user response:
Another sed:
sed -E 's/^[^"] "([^"] )("[^"] ){2}">([^<] ).*$/<\1>\3<\/\1>/' file.xml
CodePudding user response:
1st solution: With your shown samples please try following sed
code. Using -E
option to enable ERE(extended regular expression). Using sed
's capability to create capturing groups(through regex) and values captured in those capturing groups are being used later in substitution.
sed -E 's/^<Field name="([^"]*)"[^>]*>([^<]*)<.*$/<\1>\2<\/\1>/' Input_file
Here is the Online Demo for used regex for understanding purposes only.
2nd solution: With awk
please try following awk
code. Written and tested with shown samples. Making field separator as <Field name=
, "
, >
and <
for all the lines. In main block printing 3rd and 7th fields along with tags s per required output.
awk -F'^<Field name=|"|>|<' '{print "<"$3">"$7"</"$3">"}' Input_file
3rd solution: With GNU awk
using its match
function where using regex and its creating capturing groups out of it to store values into array named arr
which are being printed later to achieve goal here.
awk '
match($0,/<Field name="([^"]*)"[^"]*"[^>]*>([^<]*)</,arr){
print "<"arr[1]">" arr[2] "</"arr[1]">"
}
' Input_file
CodePudding user response:
as simple as elegant
awk -F "[\"><]" '{print "<"$3">" $7 "<"$3">"}' input_file
explanation
use '','<','>' as delimiter separate each line into several column fields
then print what you need