I have a shell script that is used for concatenating xml. the format on the xmls is the following:
<DataPDU xmlns:ns2="urn:swift:saa:xsd:saa.2.0">
<ns2:Revision>2.0.13</ns2:Revision>
<ns2:Header>
<ns2:Message>
...
</ns2:Message>
</ns2:Header>
<ns2:Body>
...
</ns2:Body>
</DataPDU>
the problem is that after running the shell for concatenating:
#!usr/bin/sh
ORIGIN_PATH="/backup/data/export/imatchISO"
HISTORY_PATH="/backup/data/batch/hist"
SEND_PATH="/backup/data/batch/output"
DATE=`date %y%m%d`
LOG="/backup/data/batch/log/concatIMatch_"$DATE
echo $(date " %Y-%m-%d %H:%M:%S") >> $LOG
echo "-----------------------------------------------------" >> $LOG
echo "-----------------------------------------------------" >> $LOG
echo "-----STARTING INTELLIMATCH FILES CONTACTENATION------" >> $LOG
echo "-----------------------------------------------------" >> $LOG
echo "-----------------------------------------------------" >> $LOG
echo $(date " %Y-%m-%d %H:%M:%S")" - Starting..." >> $LOG
echo $(date " %Y-%m-%d %H:%M:%S")" - Cambiando ruta de trabajo " $ORIGIN_PATH >> $LOG
cd $ORIGIN_PATH
echo $(date " %Y-%m-%d %H:%M:%S")" - Listando contenido previo al concatenado" >> $LOG
ls -lrt >> $LOG
echo $(date " %Y-%m-%d %H:%M:%S")" - Comenzando concatenado de ficheros 053" >> $LOG
cat $ORIGIN_PATH/SWIFTCAMT053_* >> $SEND_PATH/SWIFTCAMT053.XML_$DATE 2>> $LOG
echo $(date " %Y-%m-%d %H:%M:%S")" - Historificando ficheros..." >> $LOG
mv $ORIGIN_PATH/SWIFTCAMT053_* $HISTORY_PATH >> $LOG 2>> $LOG
if [[ $(ls -A $SEND_PATH/SWIFTCAMT053.XML_$DATE) ]]; then
echo $(date " %Y-%m-%d %H:%M:%S")" - Ficheros 053 concatenados" >> $LOG
mv $SEND_PATH/SWIFTCAMT053.XML_$DATE $SEND_PATH/SWIFTCAMT053.XML 2>> $LOG
exit 0
else
echo $(date " %Y-%m-%d %H:%M:%S")" - ¡ERROR CON LOS FICHEROS 053 AL CONCATENAR!" >> $LOG
exit 1
fi
echo $(date " %Y-%m-%d %H:%M:%S")" - FIN" >> $LOG
The final structure of the obtained file is:
<DataPDU xmlns:ns2="urn:swift:saa:xsd:saa.2.0">
<ns2:Revision>2.0.13</ns2:Revision>
<ns2:Header>
<ns2:Message>
...
</ns2:Message>
</ns2:Header>
<ns2:Body>
...
</ns2:Body>
</DataPDU>
<DataPDU xmlns:ns2="urn:swift:saa:xsd:saa.2.0">
<ns2:Revision>2.0.13</ns2:Revision>
<ns2:Header>
<ns2:Message>
...
</ns2:Message>
</ns2:Header>
<ns2:Body>
...
</ns2:Body>
</DataPDU>
<DataPDU xmlns:ns2="urn:swift:saa:xsd:saa.2.0">
<ns2:Revision>2.0.13</ns2:Revision>
<ns2:Header>
<ns2:Message>
...
</ns2:Message>
</ns2:Header>
<ns2:Body>
...
</ns2:Body>
</DataPDU>
which is not valid, as it's duplicating the initial
<DataPDU xmlns:ns2="urn:swift:saa:xsd:saa.2.0"> <ns2:Revision>2.0.13</ns2:Revision>
if I manually remove these the xml file works, but what could I do to work this out with the shell script?
The expected XML I would need would be:
<DataPDU xmlns:ns2="urn:swift:saa:xsd:saa.2.0">
<ns2:Revision>2.0.13</ns2:Revision>
<ns2:Header>
...
</ns2:Header>
<ns2:Body>
...
</ns2:Body>
<ns2:Header>
...
</ns2:Header>
<ns2:Body>
...
</ns2:Body>
</DataPDU>
so, a single opening tag of DataPDU and all the content of all other files inside that one.
The issue with my current shell script is that is appending the content of one xml just under the previous one, when what need is to have them all within the same XML tags.
CodePudding user response:
While I would recommend using a proper XML utility per @tripleee in the comments, if you can use awk
you may be able to use the following:
$ awk 'NR<3 {print} FNR>3 {print last} {last=$0} END{print}' *.xml
NR<3 {print}
print the first two lines processed.
FNR>3 {print last} {last=$0}
print lines 3 through n-1 of each file processed.
END{print}
print the last line processed.
Output:
$ awk 'NR<3 {print} FNR>3 {print last} {last=$0} END{print}' *.xml
<DataPDU xmlns:ns2="urn:swift:saa:xsd:saa.2.0">
<ns2:Revision>2.0.13</ns2:Revision>
<ns2:Header>
<ns2:Message>
m1
</ns2:Message>
</ns2:Header>
<ns2:Body>
b1
</ns2:Body>
<ns2:Header>
<ns2:Message>
m2
</ns2:Message>
</ns2:Header>
<ns2:Body>
b2
</ns2:Body>
<ns2:Header>
<ns2:Message>
m3
</ns2:Message>
</ns2:Header>
<ns2:Body>
b3
</ns2:Body>
</DataPDU>
Contents of *.xml files:
$ cat *.xml
<DataPDU xmlns:ns2="urn:swift:saa:xsd:saa.2.0">
<ns2:Revision>2.0.13</ns2:Revision>
<ns2:Header>
<ns2:Message>
m1
</ns2:Message>
</ns2:Header>
<ns2:Body>
b1
</ns2:Body>
</DataPDU>
<DataPDU xmlns:ns2="urn:swift:saa:xsd:saa.2.0">
<ns2:Revision>2.0.13</ns2:Revision>
<ns2:Header>
<ns2:Message>
m2
</ns2:Message>
</ns2:Header>
<ns2:Body>
b2
</ns2:Body>
</DataPDU>
<DataPDU xmlns:ns2="urn:swift:saa:xsd:saa.2.0">
<ns2:Revision>2.0.13</ns2:Revision>
<ns2:Header>
<ns2:Message>
m3
</ns2:Message>
</ns2:Header>
<ns2:Body>
b3
</ns2:Body>
</DataPDU>
You could also place the awk
command into a bash script that accepts 2 parameters: the source xml directory and the name of the file to hold the concatenated XML.
Script contents:
#!/bin/bash
xml_dir="$1"
output_xml_file="$2"
awk 'NR<3 {print} FNR>3 {print last} {last=$0} END{print}' "$xml_dir/"*.xml > "$output_xml_file"
Script usage:
# before script is run the output.xml file does not exist
$ ls output.xml
ls: output.xml: No such file or directory
# execute the script passing 2 parameters
$ ./script /tmp/t_dir output.xml
# after script execution, the output.xml file now exists and contains the concatenated xml
$ ls output.xml
output.xml
buck:t_dir buck$ cat output.xml
<DataPDU xmlns:ns2="urn:swift:saa:xsd:saa.2.0">
<ns2:Revision>2.0.13</ns2:Revision>
<ns2:Header>
<ns2:Message>
m1
</ns2:Message>
</ns2:Header>
<ns2:Body>
b1
</ns2:Body>
<ns2:Header>
<ns2:Message>
m2
</ns2:Message>
</ns2:Header>
<ns2:Body>
b2
</ns2:Body>
<ns2:Header>
<ns2:Message>
m3
</ns2:Message>
</ns2:Header>
<ns2:Body>
b3
</ns2:Body>
</DataPDU>