Find "component" in plain text using regex and plain javascript-CodePudding

This is a weird question I'm aware, but I am terrible at writing regex's.

The problem is fairly simple, I have a bunch of plain text coming in. And in that text are mentions of React Components.

For example:

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Mauris fringilla maximus, sed < HelloThere /> velit porttitor sed. Fusce lacinia bibendum eros, a ultricies leo sodales eget.

I need to create a regex that allows me to extrapolate that unknown react component so I can then wrap it with some mark-up automatically.

So the regex in the above example would return: "< HelloThere />"

The tricky part is it can be any React component. The component can also have props and children. This is an example of something in there as well: < Component>< Box>< Inline>< Text>Hello</ Text></ Inline></ Box></ Component>

So my initial idea was to try and find the opening "<" and then the closing "/>" and get everything in between. But I have not real clue how to go about actually doing that.

Any help is much appreciated!

PS Added spaces after the first angled bracket so SO doesn't try to mess with it

Edit:

So It's becoming clear to me that Regex might be too limited for this. I might need to figure out a clever JavaScript way or, add some tag or symbol at the beginning and end of the component which allows me to look it up more easily

CodePudding user response：

This regex will match all valid components and if the component has children it will match only the open tag.

/<[A-Z]\w*\b.*?>|(?<=>)(\w )(?=<)/g

A valid component name starts with a capital letter and is continued with any count of word characters. Also, there can be some properties till the end of the tag (the > sign).

See the demo

JavaScript Example

let jsx = `<Component><Box><Inline><Text>Hello</Text></Inline></Box></Component>`;

console.log(jsx.match(/<[A-Z]\w*\b.*?>|(?<=>)(\w )(?=<)/g))  // ["<Component>", "<Box>", "<Inline>", "<Text>", "Hello"]

CodePudding user response：

If the only characters you need to look at are < and />, then this regex is quite simple:

\<.*\/>

When applied to a string, this will return both the encapsulating angled brackets, and all content in between.

Short explanation

The \ character in regex signifies a match, so in this case, \<. and \/> will match < and /> respectively. The . character is essentially a wild card -- match all content after < and before the />. The * indicates that it shouldn't stop at the first match. If you want it to stop at the first match, remove the *:

\<.\/>