I have this code in some file
<pre dir="ltr" data-xf-init="code-block" data-lang=""><code>-Fix numcer one/Two
-EMM Support
-Fix update < broken
-Add support patch</code></pre>
</div>
</div><b><br />
I need to remove some characters and keep just this code
-Fix numcer one/Two
-EMM Support
-Fix update < broken
-Add support patch
I have try this code
#!/bin/bash
sed -n '/>-/,/</p' /home/Desktop/1 > /home/Desktop/2
sed -n '/^-*code>/p' /home/raed/Desktop/2 > /home/Desktop/3
sed -i 's#</code></pre>##' /home/Desktop/3
exit
But the code remove first line
-Fix numcer one/Two
CodePudding user response:
1st solution: Try GNU awk
for this one. With your shown samples please try following awk
code.
awk -v RS="^$" '
match($0,/(^|\n)<pre ]*".*<code>-(.*)<\/code>/,arr){
print arr[2]
}
' Input_file
Explanation: Simple explanation would be, using GNU awk
's capability to make RS
^$
and then using its match
function to match regex (^|\n)<pre ]*".*<code>-(.*)<\/code>
(explained later in this answer). This regex creates 2 capturing groups and all matched values are getting stored into array named arr
. So if regex has matched values then I am simply printing 2nd element of array arr
by using arr[2]
to get desired values.
2nd solution: With sed
using -z
and -E
options please try following code.
sed -zE 's/(^|\n)<pre ]*".*<code>-(.*)<\/code>.*/\2/' Input_file
OR if your sed
version supports \n
then with a slight change in above sed
code you can have as follows:
sed -zE 's/(^|\n)<pre ]*".*<code>-(.*)<\/code>.*/\2\n/' Input_file
3rd solution: With GNU grep
please try following code:
grep -zoP '(^|\n)<pre ]*".*?<code>-\K(.*?\n[^\n] ) (?=</code>)' Input_file
CodePudding user response:
Try this
sed 's/<[^>]*>//g' <file
It will remove everything between <
and the next >
(linewise).