I have been trying to modify a script concatenating xml files from a path and merge them into a single xml, this script was originally used for concatenating text files.
I have the following script
#!usr/bin/sh
ORIGIN_PATH="/backup/data/export/imatchISO"
HISTORY_PATH="/backup/data/batch/hist"
SEND_PATH="/backup/data/batch/output"
DATE=`date %y%m%d`
LOG="/backup/data/batch/log/concatIMatch_"$DATE
cd $ORIGIN_PATH
ls -lrt >> $LOG
cat $ORIGIN_PATH/SWIFTCAMT053_* >> $SEND_PATH/SWIFTCAMT053.XML_$DATE 2>> $LOG
mv $ORIGIN_PATH/SWIFTCAMT053_* $HISTORY_PATH >> $LOG 2>> $LOG
if [[ $(ls -A $SEND_PATH/SWIFTCAMT053.XML_$DATE) ]]; then
echo $(date " %Y-%m-%d %H:%M:%S")" - Ficheros 053 concatenados" >> $LOG
mv $SEND_PATH/SWIFTCAMT053.XML_$DATE $SEND_PATH/SWIFTCAMT053.XML 2>> $LOG
exit 0
else
echo $(date " %Y-%m-%d %H:%M:%S")" - ¡ERROR CON LOS FICHEROS 053 AL CONCATENAR!" >> $LOG
exit 1
fi
and what I have is a path containing several xml files with the same format:
<?xml version="1.0" ?>
<DataPDU xmlns:ns2="urn:swift:saa:xsd:saa.2.0">
<ns2:Revision>2.0.13</ns2:Revision>
<ns2:Header>
...
</ns2:Header>
<ns2:Body>
...
</ns2:Body>
<ns2:Header>
...
</ns2:Header>
<ns2:Body>
...
</ns2:Body>
</DataPDU>
the thing is that when I concatenate with this is appending the end of the file to the next one , which is not the expected result as it is duplicating the xml declaration tag and the opening <DataPDU>
and closing <DataPDU>
for all files.
What I'm needing is to have a single xml file with the following sctructure
<?xml version="1.0" ?>
<DataPDU xmlns:ns2="urn:swift:saa:xsd:saa.2.0">
<ns2:Revision>2.0.13</ns2:Revision>
<ns2:Header>
...
</ns2:Header>
<ns2:Body>
...
</ns2:Body>
<ns2:Header>
...
</ns2:Header>
<ns2:Body>
...
</ns2:Body>
<ns2:Header>
...
</ns2:Header>
<ns2:Body>
...
</ns2:Body>
<ns2:Header>
...
</ns2:Header>
<ns2:Body>
...
</ns2:Body>
<ns2:Header>
...
</ns2:Header>
<ns2:Body>
...
</ns2:Body>
<ns2:Header>
...
</ns2:Header>
<ns2:Body>
...
</ns2:Body>
</DataPDU>
So technically what I want is to have the first 3 lines and the last line only occurring once.
I have received a tip that I could do something with:
$ awk 'NR<3 {print} FNR>3 {print last} {last=$0} END{print}' *.xml
But I don't understand how to modify my script for this.
CodePudding user response:
Using xmllint
to properly process XML files and excluding Revision
Element from second body
body1=$(xmllint --xpath '/DataPDU/*' tmp.xml | sed -ze 's/\n/\
/g')
body2=$(xmllint --xpath '/DataPDU/*[not(local-name()="Revision")]' tmp.xml | sed -ze 's/\n/\
/g')
printf "%s\n" "cd /DataPDU" "set ${body1}${body2}" "save" "bye" | xmllint --shell tmp.xml
Code uses same file twice so change second file name accordingly.
Plain new lines \n
are replaced by its equivalent 

entity to avoid errors on xmllint
shell.
awk
can be used too but requires that XML format does not change between files.
Body can be extracted by setting record separator RS to
xmlns:ns2="urn:swift:saa:xsd:saa.2.0">
or </DataPDU>
Record #2 contains the inner elements.
# from any file
echo -e '<?xml version="1.0" ?>\n\t<DataPDU xmlns:ns2="urn:swift:saa:xsd:saa.2.0">' > output.xml
# concatenate bodies on a variable from all files
for f in *.xml; do
body =$(gawk 'BEGIN{ RS="xmlns:ns2=\"urn:swift:saa:xsd:saa.2.0\">|<[/]DataPDU>" } { if(NR == 2) { print $0 }}' "$f")
done
echo "$body" >> output.xml
# Add closing tag
echo "</DataPDU>" >> output.xml