I have non XML compliant documents (XHTML pages) with improperly closed tags,img, br, hr.
I need close image, hr, and br tags properly, with '/>'
I tried xmlstarlet, it does the job, but alters XML declaration header.
So I have original code as follows:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops" xml:lang="en" lang="en">
<head>
<title> </title>
<link rel="stylesheet" type="text/css" href="style.css" />
</head>
<body>
if I run command xmlstarlet fo --recover --html file.xhtml
,
the output is incorrect, have 2 declaration lines:
<?xml version="1.0" standalone="yes"?>
<!DOCTYPE html>
<?xml version="1.0" encoding="UTF-8" standalone="no"??>
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops" xml:lang="en" lang="en">
<head>
<title> </title>
<link rel="stylesheet" type="text/css" href="style.css"/>
</head>
<body>
if I run xmlstarlet fo --omit-decl --recover --html file.xhtml
,
the output is also incorrect, as declaration need be the first line:
<!DOCTYPE html>
<?xml version="1.0" encoding="UTF-8" standalone="no"??>
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops" xml:lang="en" lang="en">
<head>
<title> </title>
<link rel="stylesheet" type="text/css" href="style.css"/>
</head>
<body>
So I need to do post-processing, swap the first and second lines. What bash command can help here? Please specify command syntax for bath processing files and editing in place. P.S. why xmlstarlet put 2 question mark chars at the end of declaration? ("no"??>)
CodePudding user response:
I suggest to append | sed -n '1{h;d};2{p;g};p'
.
CodePudding user response:
This might work for you (GNU sed):
sed -zE 's/(.*)\n(.*)/\2\n\1/m' file
Slurp the file into memory and swap the contents of line 1 and 2.
N.B. The m
flag allows .*
to refer to lines contents.