Home > database >  Tokenize equation using regex
Tokenize equation using regex

Time:07-18

I would like to tokenize equations using regex. My inputs would be something like this:-cos(x) 4ln(x^2.2)^4=-2(pi x)

My desired output would look like this:

-cos
(
x
)
 
4
*
ln
(
x
^
2.2
)
^
4
=
-2
(
pi
 
x
) 

I have tried the following code: \s*(?:([()^ *\/-])|([a-z] )|((?:\.[0-9] |[0-9] (?:\.[0-9]*)?)(?:e[ -]?[0-9] )?)|(\S))

regex code

Here are the problems with the code:

  • There is no asterisk between numbers and function/variables
  • There is no asterisk between numbers/variables and left bracket
  • Numbers/variables/function and - must be in the same token

CodePudding user response:

First of all, it is not possible for a regular expression to match a character that is not in the input, like an implied *. So with a plain regular expression execution you'll not get those omitted * appear in the output.

However, you could say that the regular expression should produce a zero-length match whenever it finds a spot where a * was omitted.

That part of the regex could look like this:

  • ((?<=\d)(?=\s*[a-z(])): match the position right after a digit when the first non-space character that follows is a letter or an opening parenthesis.

Then the unary minus operator. This could be identified as follows:

  • ((?:^|(?<=[=(^]))(?:\s*-) )?\s*: Requiring the minus to occur at the start of the input or after an opening parenthesis, equal sign or exponent (allowing for white space in between). We allow multiple unary minuses to occur after each other, although that has no practical use...

The other parts can essentially stay as you had them. Here is the complete regex:

((?<=\d)(?=\s*[a-z(]))|(?:((?:^|(?<=[=(^]))(?:\s*-) )?\s*(?:([a-z] )|(\.\d |\d (?:\.\d*)?(?:e[ -]?\d )?)))|\s*([ *^()=\/-])|\s*(\S)

CodePudding user response:

signs = ['\ ', '\-', '\\', '\*', '\^', '\(', "\)", '\{', '\}', '\[', '\]', '\=']; // Used to define mathematical symbols

singRegex = new RegExp('[\\' signArr.join('\\') ']', 'g');

Generate regular expression like: /[\ \-\\\*\^\(\)\{\}\[\]\=]/g for symbols

txt = "-cos(x) 4ln(x^2.2)^4=-2(pi x)";
txt.replace(singRegex, '$&\n');
Output: '-\ncos(\nx)\n \n4ln(\nx^\n2.2)\n^\n4=\n-\n2(\npi \nx)\n'

Similarly create an array for mathematical keywords like: 'cos', 'sig', 'tan', 'log', 'ln', 'pi' and create another array for variables like: 'x', 'y', 'z'... whatever you are using.

I am considering array because if you want to add anything in the future, then you can add it easily.

Then create another regular expressin for number validation, that should allow (\d*.\d ) and (\d ) format: Used regex like:

const numberRegex = /(\d*\.\d )|(\d )/g;
txt.replace(/(\d*\.\d )|(\d )/g, '$&\n');
Output: '-cos(x) 4\nln(x^2.2\n)^4\n=-2\n(pi x)'

I think you understand my way. Have a good day

  • Related