Home > other >  Regular expression to determine key
Regular expression to determine key

Time:11-19

In my Angular application, I'm using a translation service that supports defining custom transpilers. The transpiler receives incoming translations based on the content in the translation files. Some are pure translations and you can also pass objects into translations (e.g. the label.profile.greeting translation key):

{
  "label.startpage.welcome": "Welcome!",
  "label.startpage.info": "This website is awesome. Hope you like it.",
  "label.profile.greeting": "Hi, {{username}}."
}

When the service tries to translate, it loops through all words within a translation and sends that word to the transpiler. The transpiler then tries to determine if it's a simple translation, or an object that needs to be injected into the translation.

My transpiler however has the ability to inject translations into other translations:

{
  "label.favourite": "My favourite fruit is",
  "label.favourite.banana": "{{label.favourite}} banana!",
  "label.favourite.pineapple": "{{label.favourite}} pineapple!"
}

I've written this simple arrow-function that my transpiler uses to determine if the incoming translation is a translation key or not:

const isTranslationKey = (value: string): boolean => /(\w (?:\.\w ) )/.test(value);

And it works. Although, this regular expression has a high security risk due to regexs security-sensetivity, according to SonarQube. I guess it's about the length of the string that may be the cause of future failures, since there's no maximum limit for the string. And I've tried to change the regex to something simular, but I can't judge if it's a better fit or not:

/^([A-Za-z]{1,10}) (\.([A-Za-z]{1,10})).{1,40}$/

I need some expertise on this matter. Thanks in advance! :)

CodePudding user response:

The final regex you can use can be either of the two, depending on whether you need to support all Unicode letters or not:

/^(?=.{3,60}$)[a-z] (?:\.[a-z] ) $/i
/^(?=.{3,60}$)\p{L} (?:\.\p{L} ) $/u

Note the second regex is ECMAScript 2018 compliant, and before using it make sure your JavaScript environment supports it.

Details:

  • ^ - start of string
  • (?=.{3,60}$) - a positive lookahead that requires three to sixty chars other than line break chars followed with end of string position immediately to the right of the current location
  • [a-z] - one or more ASCII letters (any Unicode letters if you use \p{L} or \p{Alphabetic})
  • (?:\.[a-z] ) - one or more repetitions of . and one or more ASCII letters (any Unicode letters if you use \p{L} / \p{Alphabetic})
  • $ - end of string.

See the regex demo.

  • Related