Home > Blockchain >  Regex: Add Trademark Symbol after the first word
Regex: Add Trademark Symbol after the first word

Time:02-19

I'm very new to regex and I'm trying to add a registered trademark symbol (®) after the first word.

This is my String:

productname L 67 MWA/Y

This is what it should look like:

productname® L 67 MWA/Y

Basically I have to find the end position of the first word in any given string and add a ®. I just can't figure out how to do it properly.

I am using docparser which gives a function to find a regular expression and replace it with something.

This is their information on how it's used: https://support.docparser.com/article/1290-how-does-the-regular-expression-regex-filter-work

This site refers to https://regex101.com/ so it should be the same syntax.

CodePudding user response:

The flags

I would use a regexp with the m for multipline and u for unicode support (in case the product has a non-latin char):

The pattern

^(\p{L} )\b

  • ^ to match the begin of the line.
  • ( ) to capture the product name or manifacturer.
  • \p{L} to match any letter one or more times. Most people would use \w for any word char. But this will not match Ä or õ so it could be problematic.
  • \b (optional) to match the word boundary.

Test it here: https://regex101.com/r/nSl03I/1

As you see, it would not handle spaces in the product name. You'll have to change the regexp for that, but you'll have to know what is the format of data behind the product name.

Replacement

$1® where $1 is the captured product name.

If you have to handle spaces in the manifacturer

This might be the case so that will be a bit more complicated because we don't know how many spaces they could be. But we could assume that the second value is the size of the product (XXS, XS, S, M, L, XL, XXL or even XXXL if we want).

In this case we could solve it with this regular expression:

/^([\p{L} ] )\b\s (X{0,3}[LS]|M)/gmui

I used the i flag for case-insensitive so that if the size is lowercase or uppercase it works in both conditions. The g flag is just for global, in order to not stop on the first occurence found but to continue on all matches.

  • instead of \p{L} we'll use [\p{L} ] to say that it can be a group of chars declared with brackets [ ] and so I just added a space. We could use \s but this would also match tabs or new lines so I don't want it to be a bit safer.

  • we add the \s behind our previous regexp because we have to match the space(s) behind the product name. This could be a tabulation.

  • for the size, it can be L or S with or without some X chars in front. X{0,3} will match "", "X", "XX" or "XXX". You understood that {0,3} means "between 0 and 3 times". Then to say it is either this or that, we can use the ( | ) syntax. This group is capturing. As we don't want to capture just "S" or "L", we use the non-capturing group (?: ) syntax instead of the capturing group. This is why it would become (?:S|L) to say "S" or "L". But as it's just one char and not words, it's shorter to say [SL] to say one of these chars. Then it could also be M (and not XM), so this leads to (X{0,3}[LS]|M) which will become $2 in the replacement.

Test it here: https://regex101.com/r/xHG97U/1

CodePudding user response:

Replace the first space with ® followed by a space, for example: echo "test string 1" | sed 's/ /® /'

  • Related