I wrote a tail recursive scanner for basic arithmetic expressions in OCaml
Syntax
Exp ::= n | Exp Op Exp | (Exp)
Op ::= | - | * | /
type token =
| Tkn_NUM of int
| Tkn_OP of string
| Tkn_LPAR
| Tkn_RPAR
| Tkn_END
exception ParseError of string * string
let tail_tokenize s =
let rec tokenize_rec s pos lt =
if pos < 0 then lt
else
let c = String.sub s pos 1 in
match c with
| " " -> tokenize_rec s (pos-1) lt
| "(" -> tokenize_rec s (pos-1) (Tkn_LPAR::lt)
| ")" -> tokenize_rec s (pos-1) (Tkn_RPAR::lt)
| " " | "-" | "*" | "/" -> tokenize_rec s (pos-1) ((Tkn_OP c)::lt)
| "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" ->
(match lt with
| (Tkn_NUM n)::lt' ->
(let lta = Tkn_NUM(int_of_string (c^(string_of_int n)))::lt' in
tokenize_rec s (pos-1) lta)
| _ -> tokenize_rec s (pos-1) (Tkn_NUM (int_of_string c)::lt)
)
|_ -> raise (ParseError ("Tokenizer","unknown symbol: "^c))
in
tokenize_rec s (String.length s) [Tkn_END]
During execution I get
tail_tokenize "3 4";;
Exception: Invalid_argument "String.sub / Bytes.sub".
CodePudding user response:
Your example case is this:
tail_tokenize "3 4"
The first call will look like this:
tokenize_rec "3 4" 3 Tkn_END
Since 3 is not less than 0, the first call inside tokenize_rec
will look like this:
String.sub "3 4" 3 1
If you try this yourself you'll see that it's invalid:
# String.sub "3 4" 3 1;;
Exception: Invalid_argument "String.sub / Bytes.sub".
It seems a little strange to work through the string backwards, but to do this you need to start at String.length s - 1
.
CodePudding user response:
From the error message it's clear that String.sub
is the problem. Its arguments are s
, pos
and 1
with the last being a constant and the two others coming straight from the function arguments. It might be a good idea to run this in isolation with the arguments substituted for the actual values:
let s = "3 4" in
String.sub s (String.length s) 1
Doing so we again get the same error, and hopefully it's now clear why: You're trying to get a substring of length 1 from the last character, meaning it will try to go past the end of the string, which of course it can't.
Logically, you might try to subtract 1 from pos
then, so that it takes a substring of length 1 starting from before the last character. But again you get the same error. That is because your terminating condition is pos < 0
, which means you'll try to run String sub s (0 - 1) 1
. Therefore you need to adjust the terminating condition too. But once you've done that you should be good!