Home > Enterprise >  How to obtain variables and values from string using regular expressions
How to obtain variables and values from string using regular expressions

Time:04-10

I have some dynamic strings with "human readable format" from data base queries. I need to obtain the variables and the current values directly from these strings. I've never used regular expressions before so I've tried to build an expression for it but I'm stuck.

This is how my strings look like:

"NOT (Name = John AND Date = 08.04.2022 AND (Status = Not active OR Status = Iddle) OR Surname != Doe)"

I try several expressions without success, and this is the unique with I've got some results:

/\w*\ \=\ \w*/ig

It return the next correct matches:

Name = John
Status = Iddle

And these correct (but incomplete) matches:

Date = 08
Status = Not

There is a != operator between the variable Surname and the value Doe. For this I've tried changing the = for != in the expression but it doesn't work, I've tried putting an | (an OR operator for my understanding) between them, like =|!= but it doesn't work either...

Taking some analysis on the query itself, it is formed by:

  1. Variable (any word, character, symbol, can include whitespaces, or a mix of those)
  2. Whitespace
  3. Operator (=, !=, <, <=, >=)
  4. Whitespace
  5. Value (same as variables)

This variable-operator-value can start with a whitespace or the ( character, and it can end with a whitespace or the ) character.

Any idea on how can I build an expression for it?

CodePudding user response:

It is very complicated to make one regex.

This is easier and more readable

const str = "NOT (Name = John AND Date = 08.04.2022 AND (Status = Not active OR Status = Idle) OR Surname != Doe)"
const statements = str
  .replace(/[()]/g,"")                    // ( or )
  .split(/ AND | OR |\s?NOT/)             // AND OR NOT with optional space to handle the first NOT
  .filter(word => word.includes("="))     // get rid of all the fluff
const namePairs = statements.map(statement => {
  const parts = statement.split(/ !?= /)  // = or !=
  return { [parts[0].trim()]: parts[1].trim() }
})
console.log(namePairs)

CodePudding user response:

There are many possibilities for the values. The following case-insensitive regex should be a start for you and recognizes single-quoted strings (which can contain embedded single-quotes if escaped with a backslash) or any other value that is a sequence of special non-whitespace characters to handle identifiers and numbers. It also handles the case of the value being preceded by NOT (case-insensitive) for boolean values:

\b(\w )\b\s*(=|!=|<|<=|>|>=)\s*('(?:\\'|[^'])*'|(?:NOT\s )?[\w. -] )

See regex demo

  1. \b(\w )\b - Capture group 1: Match sequence of word characters on a word boundary (the variable name).
  2. s*(=|!=|<|<=|>|>=)\s* - Capture group 2: The possible operators possibly separated from the variable name and value by whitespace.
  3. ( - Start of capture group 3 (the value).
  4. (?:\\'|[^'])*' - First alternative: possibly escaped single-quoted string.
  5. | or.
  6. (?:NOT\s )?[\w. -] - Second alternative: sequence of special non-whitespace characters (word characters, ' ', '-', '.') optionally preceded by NOT followed by whitespace.
  7. ) - End of capture group 3.
  • Related