Home > front end >  Automatically parsing language strings from blade files
Automatically parsing language strings from blade files

Time:10-18

I would like to parse out language strings from blade files ({{ __('some phrase') }}) and store them automatically in language files. I did try a plugin related to this which worked alright but did not allow functions to be ran. I created the below regex.

{                              // Opening curly brace
    (?:\s ?)?                  // Optional spacing
    {                          // Opening curly brace
        (?:\s ?)?              // Optional spacing
        (?:. ?)?               // Optional function
            __\(               // Begin translation function
                (?:\s ?)?      // Optional spacing
                    (?:'|\")   // Single or double quote

                        (. ?)  // Actual string to capture

                    (?:'|\")   // Single or double quote
                (?:\s ?)?      // Optional spacing
            (?:\)|,)           // End translation function, basically
        (?:. ?)?               // Ending optional function
        (?:\s ?)?              // Optional spacing
    }                          // Ending curly brace
    (?:\s ?)?                  // Optional spacing
}                              // Ending curly brace
"/{(?:\s ?)?{(?:\s ?)?(?:. ?)?__\((?:\s ?)?(?:'|\")(. ?)(?:'|\")(?:\s ?)?(?:\)|,)(?:. ?)?(?:\s ?)?}(?:\s ?)?}/"

I know this isn't perfect, how can I improve this further to catch some edge cases better, such as some ending string breaking the capture in the middle?

CodePudding user response:

The pattern that you tried, could also be written as

{\s*{.*?__\(\s*['"](. ?)['"]\s*[,)].*?}\s*}

A few notes

  • (?:\s ?)? can be written as \s*?
  • (?:'|\") can be written using a character class as ['"]
  • (?:. ?)? can be written as .*?
  • (?:\)|,) can be written as [),]

To prevent for example the string to break on a single or double quote in the capture group, you could use 2 negated character classes with an alternation matching from opening till closing single or double quotes.

{\s*{.*?__\(\s*("[^"]*"|'[^']*')\s*[,)].*?}\s*}

Regex demo

  • {\s*{ Match {{ with optional whitespace char in between
  • .*?__ Match as least as possible chars, then match __
  • \(\s* Match ( and optional whitespace chars
  • ( Capture group 1
    • "[^"]*" Match from an opening " till closing "
    • | Or
    • '[^']*' Match from an opening ' till closing '
  • ) Close group 1
  • \s*[,)] Match optional whitespace chars and either , or )
  • .*? Match as least as possible chars
  • }\s*} Match }} with optional whitespace chars in between

Regex demo


Note that .*? can match any character except a newline. For the example data, you can replace those 2 occurrences with \s* matching optional whitespace chars instead.

{\s*{\s*__\(\s*("[^"]*"|'[^']*')\s*[,)]\s*}\s*}

Regex demo

CodePudding user response:

I recently did something like this using this github repo as the base:

The only thing I needed from here was the actual parsing regex and logic which can be found in this file

To keep it simple, what they did was parse for and match the the actual possible functions used for displaying translated strings: __, _t, @lang. By doing this, it no longer needs to search for and match the curly braces because that could be a pain in case you are using:

  1. filters: {{ __('whatever') | ucfirst }}
  2. conditional rendering: {{ $title ?? __('Whatever') }}

And there are many other cases that would make your regex crazy.

By matching the functions only, the pattern is pretty simple:

/([__|_t|@lang])\(\h*[\'"](. )[\'"]\h*[\),]/U

Here is a Regex101 example showcasing the cases you are looking for

  • Related