Home > Blockchain >  How to sort XML records based on the timestamp tag in unix?
How to sort XML records based on the timestamp tag in unix?

Time:11-02

I have a requirement to sort XML tag value by name and not by position, as the xml tags are dynamic i.e the number of tags aren't fixed. So far I've tried below code but it fails to give the expected output.

$ sed  -e 's/<timestamp>/& /' file | sort -n -k2 | sed 's/ //g'

XML records :-

<data1><Version>101</Version><timestamp>2022-11-01T05:51:33.540</timestamp><newtag>xlc<newtag><name>XXX</name></data1>
<data1><Version>102</Version><timestamp>2022-11-01T05:49:32.511</timestamp><newtag>xlc<newtag><name>BBB</name></data1>
<data1><Version>101</Version><timestamp>2022-11-01T05:54:30.540</timestamp><name>AAA</name></data1>
<data2><Version>102</Version><timestamp>2022-11-01T05:50:33.540</timestamp><newtag>xlc<newtag><name>XXX</name></data2>
<data2><Version>101</Version><timestamp>2022-11-01T05:41:33.540</timestamp><name>YYY</name></data2> 
<data2><Version>102</Version><newtag>xlc<newtag><timestamp>2022-11-01T05:50:12.510</timestamp><name>BBB</name></data2>

expected output :-

<data2><Version>101</Version><timestamp>2022-11-01T05:41:33.540</timestamp><name>YYY</name></data2> 
<data1><Version>102</Version><timestamp>2022-11-01T05:49:32.511</timestamp><newtag>xlc<newtag><name>BBB</name></data1>
<data2><Version>102</Version><newtag>xlc<newtag><timestamp>2022-11-01T05:50:12.510</timestamp><name>BBB</name></data2>
<data2><Version>102</Version><timestamp>2022-11-01T05:50:33.540</timestamp><newtag>xlc<newtag><name>XXX</name></data2>
<data1><Version>101</Version><timestamp>2022-11-01T05:51:33.540</timestamp><newtag>xlc<newtag><name>XXX</name></data1>
<data1><Version>101</Version><timestamp>2022-11-01T05:54:30.540</timestamp><name>AAA</name></data1>

CodePudding user response:

Your code works correctly if you remove the -n switch from sort:

sed  -e 's/<timestamp>/& /' file | sort -k2 | sed 's/ //g'

CodePudding user response:

Using awk to split fields at <[/]?timestamp>, populate an array with timestamp as key and then sort it. This assumes records are in a single line.

gawk 'BEGIN{FS="<[/]?timestamp>"}{ts[$2]=$1 "<timestamp>" $2 "</timestamp>" $3}END{ n=asorti(ts,kts); for (i=1;i<=n;i ){ print ts[kts[i]]}}' tmp.xml

Result

<data2><Version>101</Version><timestamp>2022-11-01T05:41:33.540</timestamp><name>YYY</name></data2> 
<data1><Version>102</Version><timestamp>2022-11-01T05:49:32.511</timestamp><newtag>xlc<newtag><name>BBB</name></data1>
<data2><Version>102</Version><newtag>xlc<newtag><timestamp>2022-11-01T05:50:12.510</timestamp><name>BBB</name></data2>
<data2><Version>102</Version><timestamp>2022-11-01T05:50:33.540</timestamp><newtag>xlc<newtag><name>XXX</name></data2>
<data1><Version>101</Version><timestamp>2022-11-01T05:51:33.540</timestamp><newtag>xlc<newtag><name>XXX</name></data1>
<data1><Version>101</Version><timestamp>2022-11-01T05:54:30.540</timestamp><name>AAA</name></data1>
  • Related