Home > front end >  How to get only text inside div tag and src content of img tag in by using regular expression javasc
How to get only text inside div tag and src content of img tag in by using regular expression javasc

Time:10-26

I have a string generated with this format :

'fdffddf<div><br> <div><img style="max-width: 7rem;" src="folder/myimg.jpg"><div> <br></div></div><div><div> <br></div></div></div>'.

I want to create a regular expression . I want a regular expression that just fetches me the content without the div tag and the source of the image in an array.

Example in my case: [ 'fdffddf', 'folder/myimg.jpg' ]

I tried this method :

let str = 'fdffddf<div><br> <div><img style="max-width: 7rem;" src="folder/myimg.jpg"><div> <br></div></div><div><div> <br></div></div></div>'
console.log('only content without div tag and src image only without img tag : ',str.match(/<img [^>]*src="[^"]*"[^>]*>/gm)[0]) 

It doesn't work. I get only the img tag.

How can I do it please ?

CodePudding user response:

const str = 'fdffddf<div><br> <div><img style="max-width: 7rem;" src="folder/myimg.jpg"><div> <br></div></div><div><div> <br></div></div></div>';
let arr = str
  .replace(/<img\b.*?src="([^"]*).*?>/, '$1') // extract img src
  .split(/<\/?[a-z]\w*\b[^>]*>/i)  // split on HTML tags
  .filter(s => s.trim()); // filter out empty items and spaces only items
console.log(arr);

Output:

[
  "fdffddf",
  "folder/myimg.jpg"
]

Explanation of .replace() regex:

  • <img -- start of tag with tag name
  • \b -- word boundary
  • .*? -- non-greedy scan until:
  • src=" -- literal src=" text
  • ([^"]*) -- capture group with everything not a double quote
  • .*? -- non-greedy scan until:
  • > -- end of tag

Explanation of .split() regex:

  • < -- start of tag
  • \/? -- optional slash (end tag)
  • [a-z]\w* -- tag name: single alpha char followed by 1 word chars
  • \b -- word boundary after tag name
  • [^>]* -- scan over anything not end of tag
  • > -- end of tag

Please keep in mind that using regex to parse HTML is error prone. If you want to play safe it is better to use an HTML parser.

CodePudding user response:

You can do it with the following regex:

/^(?<!<)(\b[^<>] \b)(?!>).*(?<=")(\b. \b)(?=")/

This regex uses two capturing groups, one for the string at the beginning and one for the image source.

Try the following code:

const string = 'fdffddf<div><br> <div><img style="max-width: 7rem;" src="folder/myimg.jpg"><div> <br></div></div><div><div> <br></div></div></div>';
const regex = /^(?<!<)(\b[^<>] \b)(?!>).*(?<=")(\b. \b)(?=")/;
const match = string.match(regex);
console.log(`text: ${match[1]} - source: ${match[2]}`);
  • Related