Home > Mobile >  Extracting object from string
Extracting object from string

Time:08-31

I have a sample script from Webdriver.io website which is something like this (modified for testing)

const { remote } = require('webdriverio');
var assert = require('assert');

;(async () => {
        const browser = await multiremote({
            capabilities: {
                browserName: 'chrome',
            },
            capabilities: {
                browserName: 'firefox'
            }
    })
    await browser.url('https://www.google.com');
    var title = await browser.getTitle();
    assert.equal(title, 'Google');
    await browser.deleteSession()
})()

I need to extract capabilities.browserName from this, and so far I'm able to extract them like this

var browsers = str.match(/\scapabilities\s*?:\s*?\{[^{}] \}/gi); // find capabilities
console.log(browsers.length); // check length
for(let i=0; i < browsers.length; i  ){
    let b = browsers[i].replace(/\scapabilities\s*?:\s*?/, '') // remove capabilities: part
                       .replace(/'/g, '"' ) // convert single quote to double quote
                       .replace(/\s/g, '') // remove spaces, new lines, tabs etc.
                       .replace(/(\w :)|(\w  :)/g, function(matchedStr) { // wrap words in double quote as keys are not wrapped
                            return '"'   matchedStr.substring(0, matchedStr.length - 1)   '":';
                          });

    let browser = JSON.parse(b); // parse the string to convert to object
    console.log(browser.browserName); // Success !!!
}

This works fine but some time capabilities can have nested properties like

capabilities: {
   browserName: 'chrome',
   proxy: {
     proxyType: 'PAC',
     proxyAutoconfigUrl: 'http://localhost:8888',
   }
},

This breaks my code and only returns me one browser firefox. await multiremote() can be await remote() as well.

Thanks in advance

CodePudding user response:

If we can assume that there are no parentheses in the object literal that is passed to remote or multiremote, then we could capture that object literal as a whole by matching all that is between the parentheses of the function call.

I would not go through the effort of trying to convert JavaScript object literal syntax to JSON. Instead, we can assume that browserName properties imply they occur in capabilities objects, and just match the browserName key and value that follows (in single quotes).

Here is how that works out:

const str = `const { remote } = require('webdriverio');
var assert = require('assert');

;(async () => {
    const browser = await multiremote({
        myChromeBrowser: {
            capabilities: {
                browserName: 'chrome'
            }
        },
        myFirefoxBrowser: {
            capabilities: {
                browserName: 'firefox'
            }
        }
    })
    await browser.url('https://www.google.com');
    var title = await browser.getTitle();
    assert.equal(title, 'Google');
    await browser.deleteSession()
})()`;

               // find object literal, assuming it has no parentheses:
const browsers = str.match(/\bawait\s (multi)?remote\s*\((.*?)\)/gs)
               // find browserName properties with single quoted string values:
               ?.[0]?.match(/\bbrowserName\s*:\s*'[^']*/g)
               // extract the corresponding values:
               ?.map(m => m.split("'")[1]);
console.log(browsers);

CodePudding user response:

You can write a quick parser that uses { and } to check the depth of the nesting. This assumes no { characters occur in quotes, keys aren't quoted, and values are quoted with single quotes, so it's not designed to handle every edge case, only those you've provided here.

const s = `const { remote } = require('webdriverio');
var assert = require('assert');

;(async () => {
        const browser = await multiremote({
            capabilities: {
                browserName: 'chrome',
            },
            capabilities: {
                browserName: 'firefox'
            },
            capabilities: {
               browserName: 'chrome',
               proxy: {
                 proxyType: 'PAC',
                 proxyAutoconfigUrl: 'http://localhost:8888',
               }
            },
    })
    await browser.url('https://www.google.com');
    var title = await browser.getTitle();
    assert.equal(title, 'Google');
    await browser.deleteSession()
})()`;

const results = [];
const re = /\bcapabilities: *{/g;

for (let match; match = re.exec(s);) {
  let braces = 0, i = match.index;
  
  for (; i < s.length; i  ) {
    if (s[i] === "{") {
      braces  ;
    }
    else if (s[i] === "}") {
      if (--braces <= 0) {
        i  ;
        break;
      }
    }
  }

  results.push(
    s.slice(match.index, i)
     .match(/\bbrowserName: *'([^']*)'/)[1]
  );
}

console.log(results);

Also, bear in mind your original string has two capabilities keys, so only the second would exist in the object in memory. I assume this is an artifact of adapting the source string to SO, so I didn't bother adjusting that here and simply slapped on a third to show the parsing working on the scope of the question.

  • Related