How to use sed to remove some characters from file?-CodePudding

I have this code in some file

<pre  dir="ltr" data-xf-init="code-block" data-lang=""><code>-Fix numcer one/Two
-EMM Support
-Fix update &lt; broken
-Add support patch</code></pre>
</div>
</div><b><br />

I need to remove some characters and keep just this code

-Fix numcer one/Two
-EMM Support
-Fix update &lt; broken
-Add support patch

I have try this code

#!/bin/bash
sed -n '/>-/,/</p' /home/Desktop/1 > /home/Desktop/2
sed -n '/^-*code>/p' /home/raed/Desktop/2  > /home/Desktop/3
sed -i 's#</code></pre>##' /home/Desktop/3
exit

But the code remove first line

-Fix numcer one/Two

CodePudding user response：

1st solution: Try GNU awk for this one. With your shown samples please try following awk code.

awk -v RS="^$" '
match($0,/(^|\n)<pre ]*".*<code>-(.*)<\/code>/,arr){
  print arr[2]
}
'  Input_file

Explanation: Simple explanation would be, using GNU awk's capability to make RS ^$ and then using its match function to match regex (^|\n)<pre ]*".*<code>-(.*)<\/code>(explained later in this answer). This regex creates 2 capturing groups and all matched values are getting stored into array named arr. So if regex has matched values then I am simply printing 2nd element of array arr by using arr[2] to get desired values.

2nd solution: With sed using -z and -E options please try following code.

sed -zE 's/(^|\n)<pre ]*".*<code>-(.*)<\/code>.*/\2/' Input_file

OR if your sed version supports \n then with a slight change in above sed code you can have as follows:

sed -zE 's/(^|\n)<pre ]*".*<code>-(.*)<\/code>.*/\2\n/' Input_file

3rd solution: With GNU grep please try following code:

grep -zoP '(^|\n)<pre ]*".*?<code>-\K(.*?\n[^\n] ) (?=</code>)'  Input_file

CodePudding user response：

Try this

sed 's/<[^>]*>//g' <file

It will remove everything between < and the next > (linewise).