Home > Back-end >  Exception: Invalid_argument "String.sub / Bytes.sub"
Exception: Invalid_argument "String.sub / Bytes.sub"

Time:02-19

I wrote a tail recursive scanner for basic arithmetic expressions in OCaml

Syntax

Exp ::= n | Exp Op Exp | (Exp)

Op ::=   | - | * | /
type token =
| Tkn_NUM of int
| Tkn_OP of string
| Tkn_LPAR
| Tkn_RPAR
| Tkn_END

exception ParseError of string * string

let tail_tokenize s =
  let rec tokenize_rec s pos lt =
    if pos < 0 then lt
    else
      let c = String.sub s pos 1 in
      match c with
      | " " -> tokenize_rec s (pos-1) lt
      | "(" -> tokenize_rec s (pos-1) (Tkn_LPAR::lt)
      | ")" -> tokenize_rec s (pos-1) (Tkn_RPAR::lt)
      | " " | "-" | "*" | "/" -> tokenize_rec s (pos-1) ((Tkn_OP c)::lt)
      | "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" ->
        (match lt with
         | (Tkn_NUM n)::lt' -> 
           (let lta = Tkn_NUM(int_of_string (c^(string_of_int n)))::lt' in
            tokenize_rec s (pos-1) lta)
         | _ -> tokenize_rec s (pos-1) (Tkn_NUM (int_of_string c)::lt) 
         )
      |_ -> raise (ParseError ("Tokenizer","unknown symbol: "^c))
  in
  tokenize_rec s (String.length s) [Tkn_END]

During execution I get

tail_tokenize "3 4";;
Exception: Invalid_argument "String.sub / Bytes.sub".

CodePudding user response:

Your example case is this:

tail_tokenize "3 4"

The first call will look like this:

tokenize_rec "3 4" 3 Tkn_END

Since 3 is not less than 0, the first call inside tokenize_rec will look like this:

String.sub "3 4" 3 1

If you try this yourself you'll see that it's invalid:

# String.sub "3 4" 3 1;;
Exception: Invalid_argument "String.sub / Bytes.sub".

It seems a little strange to work through the string backwards, but to do this you need to start at String.length s - 1.

CodePudding user response:

From the error message it's clear that String.sub is the problem. Its arguments are s, pos and 1 with the last being a constant and the two others coming straight from the function arguments. It might be a good idea to run this in isolation with the arguments substituted for the actual values:

let s = "3 4" in
String.sub s (String.length s) 1

Doing so we again get the same error, and hopefully it's now clear why: You're trying to get a substring of length 1 from the last character, meaning it will try to go past the end of the string, which of course it can't.

Logically, you might try to subtract 1 from pos then, so that it takes a substring of length 1 starting from before the last character. But again you get the same error. That is because your terminating condition is pos < 0, which means you'll try to run String sub s (0 - 1) 1. Therefore you need to adjust the terminating condition too. But once you've done that you should be good!

  • Related