Home > Blockchain >  How to define a regex-matched string type of variable length in Typescript?
How to define a regex-matched string type of variable length in Typescript?

Time:12-13

How to create a type representing a hexadecimal string?

let str: ByteString = "f1afe3"; // valid
let str1: ByteString = "fa1"    // invalid, hex string length should be even
let str2: ByteString = "hello"  //invalid, only hex allow
let str3: ByteString = "ffeeaa3300"; // valid

I found examples here and here, but both of them only allow fixed length strings. Possible to extend them to arbitrarily long strings?

CodePudding user response:

There is no specific type in TypeScript that works this way. There was a longstanding feature request at microsoft/TypeScript#6579 for regular expression validated string types, where you could presumably just write something like

// INVALID TYPESCRIPT, DO NOT TRY THIS:
type ByteString = r/^(?:[0-9a-fA-F][0-9a-fA-F])*$/

and it would just work. Unfortunately there are no such types in TypeScript. You can't do this via a regular expression in the type system.


The issue above was closed after the introduction of template literal types since they allow for manipulation of string literal types. We could try to use template literal types for your use case, but there are a few major problems that stop us.

First, you could generate a union type for a valid hexidecimal digit:

type _HexDigit = '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' |
    'A' | 'B' | 'C' | 'D' | 'E' | 'F'
type HexDigit = _HexDigit | Lowercase<_HexDigit>;

And then write a type for valid pairs of digits:

type Nybble = `${HexDigit}${HexDigit}`;
/* type Nybble = "00" | "01" | "02" | "03" | "04" | "05" | "06" | "07" | "08" | "09" | 
  "0A" | "0B" | "0C" | "0D" | "0E" | "0F" | "0a" | "0b" | "0c" | "0d" | "0e" | "0f" | 
  "10" | "11" | "12" | "13" | "14" | "15" | "16" | "17" | "18" | "19" | 
  "1A" | "1B" | "1C" | "1D" | "1E" | "1F" | "1a" | "1b" | "1c" | "1d" | "1e" | "1f" | 
  "20" | "21" | "22" | "23" | "24" | "25" | "26" | "27" | "28" | "29" | 
  "2A" | "2B" | "2C" | "2D" | "2E" | "2F" | ... */

But then you get stuck trying to proceed much further. The language doesn't let you write circular template literal types, so you can't do this:

type ByteStringX = "" | `${Nybble}${ByteStringX}`; // error!
//   ~~~~~~~~~~~ <--
// Type alias 'ByteStringX' circularly references itself.

You could try unrolling that loop, but as soon as you try, the compiler complains that the resulting union is too big:

type ByteString0 = "" | Nybble | `${Nybble}${Nybble}`; // error! 
// ----------------------------> ~~~~~~~~~~~~~~~~~~~~
// Expression produces a union type that is too complex to represent.

Unions in TypeScript can only hold something like 100,000 members... and there are over 230,000 valid four-digit hex strings. So we're stuck. There's just no way to write ByteString as a specific type.


What we can do is write a generic type ByteString<T> that validates a candidate string literal type T such that T extends ByteString<T> if and only if T is a valid byte string type. So it behaves like a constraint instead of a type. In order to use it, we'd need a helper function to infer the generic type argument T. That is, instead of

type ByteString = ...
const x: ByteString = "..."

you'd have

type ByteString<T extends string> = ...
function byteString<T extends string> = ...
const x = byteString("...");

Here's how it could work:

type ByteString<T extends string, A extends string = ""> =
    T extends "" ? A :
    T extends `${infer D0 extends HexDigit}${infer D1 extends HexDigit}${infer R}`
    ? ByteString<R, `${A}${D0}${D1}`> : `${A}${Nybble}`

const byteString = <T extends string>(
    str: T extends ByteString<T> ? T : ByteString<T>
) => str;

So ByteString<T> is a tail-recursive conditional type. If T is the empty string then we accept it; that's the base case. Otherwise we try to parse T into the first two hexadecimal digits D0 and D1, and then the rest of the string R. If that parsing works, then we recurse, by calculating ByteString<R>. If that parsing fails, then we return a "close" valid string, so that failures produce a hopefully helpful error message.

Let's test it out:

let str = byteString("f1afe3"); // okay
let str1 = byteString("fa1"); // error!
// Argument of type '"fa1"' is not assignable to parameter of type 
// '"fa00" | "fa01" | "fa02" | "fa03" | "fa04" | "fa05" | "fa06" | ...
let str2 = byteString("hello"); // error!
// Argument of type '"hello"' is not assignable to parameter of type '"00" | "01" | 
// "02" | "03" | "04" | "05" | "06" | "07" | "08" | "09" | ...
let str3 = byteString("ffeeaa3300"); // okay

This behaves how you want. str0 and str3 are accepted, while str1 and str2 generate error messages about how the input is inappropriate. "fa1" fails and is compared to `fa${Nybble}`, since "1" is not a valid pair of digits. And "hello" fails and is compared do `${Nybble}`, since "he" is not a valid pair of digits.


So that's the closest we can get to what you want. If it works for your use case, that's great.

If not, then there's still an open issue for regular expression validated string types at microsoft/TypeScript#41160. You might want to go to that issue, give it a

  • Related