Home > front end >  Extract content of code which start with a curly bracket and ends with a curly bracket followed by c
Extract content of code which start with a curly bracket and ends with a curly bracket followed by c

Time:01-21

I'm completely mess with Regular Expressions right now(lack of practice). I'm writing a node script, which goes through a bunch of js files, each file calls a function, with one of the arguments being a json. The aim is to get all those json arguments and place them in one file. The problem I'm facing at the moment is the extraction of the argument part of the code, here is the function call part of that string:

$translateProvider.translations('de', {
        WASTE_MANAGEMENT: 'Abfallmanagement',
        WASTE_TYPE_LIST: 'Abfallarten',
        WASTE_ENTRY_LIST: 'Abfalleinträge',
        WASTE_TYPE: 'Abfallart',
        TREATMENT_TYPE: 'Behandlungsart',
        TREATMENT_TYPE_STATUS: 'Status Behandlungsart',
        DUPLICATED_TREATMENT_TYPE: 'Doppelte Behandlungsart',
        TREATMENT_TYPE_LIST: 'Behandlungsarten',
        TREATMENT_TARGET_LIST: 'Ziele Behandlungsarten',
        TREATMENT_TARGET_ADD: 'Ziel Behandlungsart hinzufügen',
        SITE_TARGET: 'Gebäudeziel',
        WASTE_TREATMENT_TYPES: 'Abfallbehandlungsarten',
        WASTE_TREATMENT_TARGETS: '{{Abfallbehandlungsziele}}',
        WASTE_TREATMENT_TYPES_LIST: '{{Abfallbehandlungsarten}}',
        WASTE_TYPE_ADD: 'Abfallart hinzufügen',
        UNIT_ADD: 'Einheit hinzufügen'
})

So I'm trying to write a regular expression which matches the segment of the js code, which starts with "'de', {" and ends with "})", while it can have any characters between(single/double curly brackets included). I tried something like this \'de'\s*,\s*{([^}]*)})\ , but that doesn't work. The furthest I got was with this \'de'\s*,\s*{([^})]*)}\ , but this ends at the first closing curly bracket within the json, which is not what I want. It seems, that even the concepts of regular exressions I understood before, now I completely forgot. Any is help is much appreciated.

CodePudding user response:

This can be done with lookahead, lookbehind, and boundary-type assertions:

/(?<=^\$translateProvider\.translations\('de', {)[\s\S]*(?=}\)$)/
  • (?<=^\$translateProvider\.translations\('de', {) is a lookbehind assertion that checks for '$translateProvider.translations('de', {' at the beginning of the string.
  • (?=}\)$) is a lookahead assertion that checks for '})' at the end of the string.
  • [\s\S]* is a character class that matches any sequence of space and non-space characters between the two assertions.

Here is the regex101 link for you to test

Hope this helps.

CodePudding user response:

You did not state the desired output. Here is a solution that parses the text, and creates an array of arrays. You can easily transform that to a desired output.

const input = `$translateProvider.translations('de', {
        WASTE_MANAGEMENT: 'Abfallmanagement',
        WASTE_TYPE_LIST: 'Abfallarten',
        WASTE_ENTRY_LIST: 'Abfalleinträge',
        WASTE_TYPE: 'Abfallart',
        TREATMENT_TYPE: 'Behandlungsart',
        TREATMENT_TYPE_STATUS: 'Status Behandlungsart',
        DUPLICATED_TREATMENT_TYPE: 'Doppelte Behandlungsart',
        TREATMENT_TYPE_LIST: 'Behandlungsarten',
        TREATMENT_TARGET_LIST: 'Ziele Behandlungsarten',
        TREATMENT_TARGET_ADD: 'Ziel Behandlungsart hinzufügen',
        SITE_TARGET: 'Gebäudeziel',
        WASTE_TREATMENT_TYPES: 'Abfallbehandlungsarten',
        WASTE_TREATMENT_TARGETS: '{{Abfallbehandlungsziele}}',
        WASTE_TREATMENT_TYPES_LIST: '{{Abfallbehandlungsarten}}',
        WASTE_TYPE_ADD: 'Abfallart hinzufügen',
        UNIT_ADD: 'Einheit hinzufügen'
})`;

const regex1 = /\.translations\([^{]*\{\s (.*?)\s*\}\)/s;
const regex2 = /',[\r\n] \s*/;
const regex3 = /:  '/;
let result = [];
let m = input.match(regex1);
if(m) {
  result = m[1].split(regex2).map(line => line.split(regex3));
}
console.log(result);

Explanation of regex1:

  • \.translations\( -- literal .translations(
  • [^{]* -- anything not {
  • \{\s -- { and all whitespace
  • (.*?) -- capture group 1 with non-greedy scan up to:
  • \s*\}\) -- whitespace, followed by })
  • s flag to make . match newlines

Explanation of regex2:

  • ',[\r\n] \s* -- ',, followed by newlines and space (to split lines)

Explanation of regex3:

  • : ' -- literal : ' (to split key/value)

Learn more about regex: https://twiki.org/cgi-bin/view/Codev/TWikiPresentation2018x10x14Regex

  • Related