I have a HTML document from which I would like to filter out specific parts of a line and do it across the whole document where that type of line is present.
I've included a code which will explain what I need much better than my wording :D
You can see that the appears few times and has different values, I would like to avoid cleaning this manually and extract only the important part between the quotation marks so that I get something like this;
ab.cdefghi, ab.cdefghi
cd.cdefghi, cd.cdefghi
ef.cdefghi, ef.cdefghi
Here's the full code example
<html itemscope itemtype="https://schema.org/Product">
<head>
<meta content="text/html;charset=utf-8" http-equiv="Content-Type">
<meta content="utf-8" http-equiv="encoding">
<title></title>
<div top="ab.cdefghi" bot="ab.cdefghi">
<meta content="text/html;charset=utf-8" http-equiv="Content-Type">
<meta content="utf-8" http-equiv="encoding">
<title></title>
<div top="cd.cdefghi" bot="cd.cdefghi">
<meta content="text/html;charset=utf-8" http-equiv="Content-Type">
<meta content="utf-8" http-equiv="encoding">
<title></title>
<div top="ef.cdefghi" bot="ef.cdefghi">
<meta content="text/html;charset=utf-8" http-equiv="Content-Type">
<meta content="utf-8" http-equiv="encoding">
<title></title>
<script type="text/javascript">
var
</script>
CodePudding user response:
Try this code, gave me the result ..
Find:^(?:(?!<div).*\n?)|\G<div.*?"(.*?" )bot="(.*?)">
Replace All:$1$2