Concatenate XML via Shell script-CodePudding

I have a shell script that is used for concatenating xml. the format on the xmls is the following:

<DataPDU xmlns:ns2="urn:swift:saa:xsd:saa.2.0">
  <ns2:Revision>2.0.13</ns2:Revision>
   <ns2:Header>
   <ns2:Message>
   ...
   </ns2:Message>
   </ns2:Header>
   <ns2:Body>
   ...
   </ns2:Body>
</DataPDU>

the problem is that after running the shell for concatenating:

#!usr/bin/sh
ORIGIN_PATH="/backup/data/export/imatchISO"
HISTORY_PATH="/backup/data/batch/hist"
SEND_PATH="/backup/data/batch/output"
DATE=`date  %y%m%d`
LOG="/backup/data/batch/log/concatIMatch_"$DATE

echo $(date " %Y-%m-%d %H:%M:%S") >> $LOG
echo "-----------------------------------------------------" >> $LOG
echo "-----------------------------------------------------" >> $LOG
echo "-----STARTING INTELLIMATCH FILES CONTACTENATION------" >> $LOG
echo "-----------------------------------------------------" >> $LOG
echo "-----------------------------------------------------" >> $LOG

echo $(date " %Y-%m-%d %H:%M:%S")" - Starting..." >> $LOG
echo $(date " %Y-%m-%d %H:%M:%S")" - Cambiando ruta de trabajo " $ORIGIN_PATH >> $LOG
cd $ORIGIN_PATH
echo $(date " %Y-%m-%d %H:%M:%S")" - Listando contenido previo al concatenado" >> $LOG
ls -lrt >> $LOG

echo $(date " %Y-%m-%d %H:%M:%S")" - Comenzando concatenado de ficheros 053"  >> $LOG
cat $ORIGIN_PATH/SWIFTCAMT053_* >> $SEND_PATH/SWIFTCAMT053.XML_$DATE 2>> $LOG

echo $(date " %Y-%m-%d %H:%M:%S")" - Historificando ficheros..."  >> $LOG
mv $ORIGIN_PATH/SWIFTCAMT053_* $HISTORY_PATH >> $LOG 2>> $LOG


if [[ $(ls -A $SEND_PATH/SWIFTCAMT053.XML_$DATE) ]]; then
    echo $(date " %Y-%m-%d %H:%M:%S")" - Ficheros 053 concatenados"  >> $LOG
        mv $SEND_PATH/SWIFTCAMT053.XML_$DATE $SEND_PATH/SWIFTCAMT053.XML 2>> $LOG
        exit 0
else
    echo $(date " %Y-%m-%d %H:%M:%S")" - Â¡ERROR CON LOS FICHEROS 053 AL CONCATENAR!"  >> $LOG
        exit 1
fi

echo $(date " %Y-%m-%d %H:%M:%S")" - FIN"  >> $LOG

The final structure of the obtained file is:

<DataPDU xmlns:ns2="urn:swift:saa:xsd:saa.2.0">
    <ns2:Revision>2.0.13</ns2:Revision>
    <ns2:Header>
    <ns2:Message>
    ...
    </ns2:Message>
    </ns2:Header>
    <ns2:Body>
    ...
    </ns2:Body>
    </DataPDU>
    <DataPDU xmlns:ns2="urn:swift:saa:xsd:saa.2.0">
        <ns2:Revision>2.0.13</ns2:Revision>
        <ns2:Header>
        <ns2:Message>
        ...
        </ns2:Message>
        </ns2:Header>
        <ns2:Body>
        ...
        </ns2:Body>
        </DataPDU>
    <DataPDU xmlns:ns2="urn:swift:saa:xsd:saa.2.0">
    <ns2:Revision>2.0.13</ns2:Revision>
    <ns2:Header>
    <ns2:Message>
    ...
    </ns2:Message>
    </ns2:Header>
    <ns2:Body>
    ...
    </ns2:Body>
</DataPDU>

which is not valid, as it's duplicating the initial

<DataPDU xmlns:ns2="urn:swift:saa:xsd:saa.2.0">
  <ns2:Revision>2.0.13</ns2:Revision>

if I manually remove these the xml file works, but what could I do to work this out with the shell script?

The expected XML I would need would be:

<DataPDU xmlns:ns2="urn:swift:saa:xsd:saa.2.0">
    <ns2:Revision>2.0.13</ns2:Revision>
        <ns2:Header>
        ...
        </ns2:Header>
        
        <ns2:Body>
        ...
        </ns2:Body>

        <ns2:Header>
        ...
        </ns2:Header>
        
        <ns2:Body>
        ...
        </ns2:Body>

</DataPDU>

so, a single opening tag of DataPDU and all the content of all other files inside that one.

The issue with my current shell script is that is appending the content of one xml just under the previous one, when what need is to have them all within the same XML tags.

CodePudding user response：

While I would recommend using a proper XML utility per @tripleee in the comments, if you can use awk you may be able to use the following:

$ awk 'NR<3 {print} FNR>3 {print last} {last=$0} END{print}' *.xml

NR<3 {print} print the first two lines processed.

FNR>3 {print last} {last=$0} print lines 3 through n-1 of each file processed.

END{print} print the last line processed.

Output:

$ awk 'NR<3 {print} FNR>3 {print last} {last=$0} END{print}' *.xml
<DataPDU xmlns:ns2="urn:swift:saa:xsd:saa.2.0">
  <ns2:Revision>2.0.13</ns2:Revision>
   <ns2:Header>
   <ns2:Message>
   m1
   </ns2:Message>
   </ns2:Header>
   <ns2:Body>
   b1
   </ns2:Body>
   <ns2:Header>
   <ns2:Message>
   m2
   </ns2:Message>
   </ns2:Header>
   <ns2:Body>
   b2
   </ns2:Body>
   <ns2:Header>
   <ns2:Message>
   m3
   </ns2:Message>
   </ns2:Header>
   <ns2:Body>
   b3
   </ns2:Body>
</DataPDU>

Contents of *.xml files:

$ cat *.xml
<DataPDU xmlns:ns2="urn:swift:saa:xsd:saa.2.0">
  <ns2:Revision>2.0.13</ns2:Revision>
   <ns2:Header>
   <ns2:Message>
   m1
   </ns2:Message>
   </ns2:Header>
   <ns2:Body>
   b1
   </ns2:Body>
</DataPDU>
<DataPDU xmlns:ns2="urn:swift:saa:xsd:saa.2.0">
  <ns2:Revision>2.0.13</ns2:Revision>
   <ns2:Header>
   <ns2:Message>
   m2
   </ns2:Message>
   </ns2:Header>
   <ns2:Body>
   b2
   </ns2:Body>
</DataPDU>
<DataPDU xmlns:ns2="urn:swift:saa:xsd:saa.2.0">
  <ns2:Revision>2.0.13</ns2:Revision>
   <ns2:Header>
   <ns2:Message>
   m3
   </ns2:Message>
   </ns2:Header>
   <ns2:Body>
   b3
   </ns2:Body>
</DataPDU>

You could also place the awk command into a bash script that accepts 2 parameters: the source xml directory and the name of the file to hold the concatenated XML.

Script contents:

#!/bin/bash

xml_dir="$1"
output_xml_file="$2"

awk 'NR<3 {print} FNR>3 {print last} {last=$0} END{print}' "$xml_dir/"*.xml > "$output_xml_file"

Script usage:

# before script is run the output.xml file does not exist
$ ls output.xml
ls: output.xml: No such file or directory

# execute the script passing 2 parameters
$ ./script /tmp/t_dir output.xml

# after script execution, the output.xml file now exists and contains the concatenated xml
$ ls output.xml 
output.xml
buck:t_dir buck$ cat output.xml 
<DataPDU xmlns:ns2="urn:swift:saa:xsd:saa.2.0">
  <ns2:Revision>2.0.13</ns2:Revision>
   <ns2:Header>
   <ns2:Message>
   m1
   </ns2:Message>
   </ns2:Header>
   <ns2:Body>
   b1
   </ns2:Body>
   <ns2:Header>
   <ns2:Message>
   m2
   </ns2:Message>
   </ns2:Header>
   <ns2:Body>
   b2
   </ns2:Body>
   <ns2:Header>
   <ns2:Message>
   m3
   </ns2:Message>
   </ns2:Header>
   <ns2:Body>
   b3
   </ns2:Body>
</DataPDU>