Home > Net >  PHP "supported scripts" (e.g. Cyrillic, Han etc) - definition of the unicode ranges for th
PHP "supported scripts" (e.g. Cyrillic, Han etc) - definition of the unicode ranges for th

Time:08-16

I see in the PHP online manual (https://www.php.net/manual/en/regexp.reference.unicode.php) that there is a long list of predefined "supported scripts", which is largely helpful in detecting if a string contains, say, Cyrillic, Han or whatever. However, that long list doesn't include a definition of what Unicode, er, "characters" are included in those predefined identities. Many are "obvious" (or one might expect them to overlap with the Unicode blocks (https://en.wikipedia.org/wiki/Unicode_block), but several are not, such as "Common" and "Inherited" (these are mentioned in that Wikipedia page re Unicode blocks, but without the specifics - e.g. "Inherited (2 characters)" in the block range U 0400..U 04FF, but without saying which 2 characters in that range). Outside of the Zend source code, is there a public specification of the ranges covered by these predefined identities?

CodePudding user response:

Yes, the data file Scripts.txt in the Unicode Standard.

CodePudding user response:

I decided in the end that references to non-PHP documentation, although very helpful (especially given the provenance of Unicode.org), won't necessarily reflect exactly the PHP implementations of its own supported Scripts. So I wrote the following PHP that tests every character in the range 0-0x10ffff

<?php

$scriptNames = array(
    'Arabic',
    'Armenian',
    'Avestan',
    'Balinese',
    'Bamum',
    'Batak',
    'Bengali',
    'Bopomofo',
    'Brahmi',
    'Braille',
    'Buginese',
    'Buhid',
    'Canadian_Aboriginal',
    'Carian',
    'Chakma',
    'Cham',
    'Cherokee',
    'Common',
    'Coptic',
    'Cuneiform',
    'Cypriot',
    'Cyrillic',
    'Deseret',
    'Devanagari',
    'Egyptian_Hieroglyphs',
    'Ethiopic',
    'Georgian',
    'Glagolitic',
    'Gothic',
    'Greek',
    'Gujarati',
    'Gurmukhi',
    'Han',
    'Hangul',
    'Hanunoo',
    'Hebrew',
    'Hiragana',
    'Imperial_Aramaic',
    'Inherited',
    'Inscriptional_Pahlavi',
    'Inscriptional_Parthian',
    'Javanese',
    'Kaithi',
    'Kannada',
    'Katakana',
    'Kayah_Li',
    'Kharoshthi',
    'Khmer',
    'Lao',
    'Latin',
    'Lepcha',
    'Limbu',
    'Linear_B',
    'Lisu',
    'Lycian',
    'Lydian',
    'Malayalam',
    'Mandaic',
    'Meetei_Mayek',
    'Meroitic_Cursive',
    'Meroitic_Hieroglyphs',
    'Miao',
    'Mongolian',
    'Myanmar',
    'New_Tai_Lue',
    'Nko',
    'Ogham',
    'Old_Italic',
    'Old_Persian',
    'Old_South_Arabian',
    'Old_Turkic',
    'Ol_Chiki',
    'Oriya',
    'Osmanya',
    'Phags_Pa',
    'Phoenician',
    'Rejang',
    'Runic',
    'Samaritan',
    'Saurashtra',
    'Sharada',
    'Shavian',
    'Sinhala',
    'Sora_Sompeng',
    'Sundanese',
    'Syloti_Nagri',
    'Syriac',
    'Tagalog',
    'Tagbanwa',
    'Tai_Le',
    'Tai_Tham',
    'Tai_Viet',
    'Takri',
    'Tamil',
    'Telugu',
    'Thaana',
    'Thai',
    'Tibetan',
    'Tifinagh',
    'Ugaritic',
    'Vai',
    'Yi'
);
$scriptTypes = array();
foreach( $scriptNames as $n ) $scriptTypes[ $n ] = array();
for( $i=0; $i <= 0x10ffff; $i   ) {

    foreach( $scriptNames as $scriptName ) {

        if ( preg_match( '/[\p{'. $scriptName .'}]/u', mb_chr( $i, 'UTF-8') ) ) {

            if (empty( $scriptTypes[ $scriptName ])
                || ( ($i - $scriptTypes[ $scriptName ][ count( $scriptTypes[ $scriptName ] ) - 1 ][1]) > 1)
            ) {

                $scriptTypes[ $scriptName ][] = [$i, $i];

            } else {

                $scriptTypes[ $scriptName ][ count( $scriptTypes[ $scriptName ] ) - 1 ][1] = $i;
            }
        }
    }
}
foreach( $scriptTypes as $scriptName => $unicodeRanges ) {

    $cc ='';
    $n=0;
    foreach( $unicodeRanges as $r ) {

        $cc .= sprintf(
            '\x{x}',
            $r[0]
        );
        $n  ;
        if ($r[1] > $r[0] ) {

            $cc .= sprintf(
                '-\x{x}',
                $r[1]
            );
            $n  = $r[1] - $r[0];
        }
    }
    printf(
        '%s(%d)=[%s]'.PHP_EOL,
        $scriptName,
        $n,
        $cc
    );
}

This gives the following output in the form <Name>(<code points covered>)=<PCRE character class>:

Arabic(1281)=[\x{0600}-\x{0604}\x{0606}-\x{060b}\x{060d}-\x{061a}\x{061c}\x{061e}\x{0620}-\x{063f}\x{0641}-\x{064a}\x{0656}-\x{066f}\x{0671}-\x{06dc}\x{06de}-\x{06ff}\x{0750}-\x{077f}\x{08a0}-\x{08b4}\x{08b6}-\x{08bd}\x{08d3}-\x{08e1}\x{08e3}-\x{08ff}\x{fb50}-\x{fbc1}\x{fbd3}-\x{fd3d}\x{fd50}-\x{fd8f}\x{fd92}-\x{fdc7}\x{fdf0}-\x{fdfd}\x{fe70}-\x{fe74}\x{fe76}-\x{fefc}\x{10e60}-\x{10e7e}\x{1ee00}-\x{1ee03}\x{1ee05}-\x{1ee1f}\x{1ee21}-\x{1ee22}\x{1ee24}\x{1ee27}\x{1ee29}-\x{1ee32}\x{1ee34}-\x{1ee37}\x{1ee39}\x{1ee3b}\x{1ee42}\x{1ee47}\x{1ee49}\x{1ee4b}\x{1ee4d}-\x{1ee4f}\x{1ee51}-\x{1ee52}\x{1ee54}\x{1ee57}\x{1ee59}\x{1ee5b}\x{1ee5d}\x{1ee5f}\x{1ee61}-\x{1ee62}\x{1ee64}\x{1ee67}-\x{1ee6a}\x{1ee6c}-\x{1ee72}\x{1ee74}-\x{1ee77}\x{1ee79}-\x{1ee7c}\x{1ee7e}\x{1ee80}-\x{1ee89}\x{1ee8b}-\x{1ee9b}\x{1eea1}-\x{1eea3}\x{1eea5}-\x{1eea9}\x{1eeab}-\x{1eebb}\x{1eef0}-\x{1eef1}]
Armenian(95)=[\x{0531}-\x{0556}\x{0559}-\x{0588}\x{058a}\x{058d}-\x{058f}\x{fb13}-\x{fb17}]
Avestan(61)=[\x{10b00}-\x{10b35}\x{10b39}-\x{10b3f}]
Balinese(121)=[\x{1b00}-\x{1b4b}\x{1b50}-\x{1b7c}]
Bamum(657)=[\x{a6a0}-\x{a6f7}\x{16800}-\x{16a38}]
Batak(56)=[\x{1bc0}-\x{1bf3}\x{1bfc}-\x{1bff}]
Bengali(96)=[\x{0980}-\x{0983}\x{0985}-\x{098c}\x{098f}-\x{0990}\x{0993}-\x{09a8}\x{09aa}-\x{09b0}\x{09b2}\x{09b6}-\x{09b9}\x{09bc}-\x{09c4}\x{09c7}-\x{09c8}\x{09cb}-\x{09ce}\x{09d7}\x{09dc}-\x{09dd}\x{09df}-\x{09e3}\x{09e6}-\x{09fe}]
Bopomofo(72)=[\x{02ea}-\x{02eb}\x{3105}-\x{312f}\x{31a0}-\x{31ba}]
Brahmi(109)=[\x{11000}-\x{1104d}\x{11052}-\x{1106f}\x{1107f}]
Braille(256)=[\x{2800}-\x{28ff}]
Buginese(30)=[\x{1a00}-\x{1a1b}\x{1a1e}-\x{1a1f}]
Buhid(20)=[\x{1740}-\x{1753}]
Canadian_Aboriginal(710)=[\x{1400}-\x{167f}\x{18b0}-\x{18f5}]
Carian(49)=[\x{102a0}-\x{102d0}]
Chakma(70)=[\x{11100}-\x{11134}\x{11136}-\x{11146}]
Cham(83)=[\x{aa00}-\x{aa36}\x{aa40}-\x{aa4d}\x{aa50}-\x{aa59}\x{aa5c}-\x{aa5f}]
Cherokee(172)=[\x{13a0}-\x{13f5}\x{13f8}-\x{13fd}\x{ab70}-\x{abbf}]
Common(7805)=[\x{0000}-\x{0040}\x{005b}-\x{0060}\x{007b}-\x{00a9}\x{00ab}-\x{00b9}\x{00bb}-\x{00bf}\x{00d7}\x{00f7}\x{02b9}-\x{02df}\x{02e5}-\x{02e9}\x{02ec}-\x{02ff}\x{0374}\x{037e}\x{0385}\x{0387}\x{0589}\x{0605}\x{060c}\x{061b}\x{061f}\x{0640}\x{06dd}\x{08e2}\x{0964}-\x{0965}\x{0e3f}\x{0fd5}-\x{0fd8}\x{10fb}\x{16eb}-\x{16ed}\x{1735}-\x{1736}\x{1802}-\x{1803}\x{1805}\x{1cd3}\x{1ce1}\x{1ce9}-\x{1cec}\x{1cee}-\x{1cf3}\x{1cf5}-\x{1cf7}\x{1cfa}\x{2000}-\x{200b}\x{200e}-\x{2064}\x{2066}-\x{2070}\x{2074}-\x{207e}\x{2080}-\x{208e}\x{20a0}-\x{20bf}\x{2100}-\x{2125}\x{2127}-\x{2129}\x{212c}-\x{2131}\x{2133}-\x{214d}\x{214f}-\x{215f}\x{2189}-\x{218b}\x{2190}-\x{2426}\x{2440}-\x{244a}\x{2460}-\x{27ff}\x{2900}-\x{2b73}\x{2b76}-\x{2b95}\x{2b98}-\x{2bff}\x{2e00}-\x{2e4f}\x{2ff0}-\x{2ffb}\x{3000}-\x{3004}\x{3006}\x{3008}-\x{3020}\x{3030}-\x{3037}\x{303c}-\x{303f}\x{309b}-\x{309c}\x{30a0}\x{30fb}-\x{30fc}\x{3190}-\x{319f}\x{31c0}-\x{31e3}\x{3220}-\x{325f}\x{327f}-\x{32cf}\x{32ff}\x{3358}-\x{33ff}\x{4dc0}-\x{4dff}\x{a700}-\x{a721}\x{a788}-\x{a78a}\x{a830}-\x{a839}\x{a92e}\x{a9cf}\x{ab5b}\x{fd3e}-\x{fd3f}\x{fe10}-\x{fe19}\x{fe30}-\x{fe52}\x{fe54}-\x{fe66}\x{fe68}-\x{fe6b}\x{feff}\x{ff01}-\x{ff20}\x{ff3b}-\x{ff40}\x{ff5b}-\x{ff65}\x{ff70}\x{ff9e}-\x{ff9f}\x{ffe0}-\x{ffe6}\x{ffe8}-\x{ffee}\x{fff9}-\x{fffd}\x{10100}-\x{10102}\x{10107}-\x{10133}\x{10137}-\x{1013f}\x{10190}-\x{1019b}\x{101d0}-\x{101fc}\x{102e1}-\x{102fb}\x{16fe2}-\x{16fe3}\x{1bca0}-\x{1bca3}\x{1d000}-\x{1d0f5}\x{1d100}-\x{1d126}\x{1d129}-\x{1d166}\x{1d16a}-\x{1d17a}\x{1d183}-\x{1d184}\x{1d18c}-\x{1d1a9}\x{1d1ae}-\x{1d1e8}\x{1d2e0}-\x{1d2f3}\x{1d300}-\x{1d356}\x{1d360}-\x{1d378}\x{1d400}-\x{1d454}\x{1d456}-\x{1d49c}\x{1d49e}-\x{1d49f}\x{1d4a2}\x{1d4a5}-\x{1d4a6}\x{1d4a9}-\x{1d4ac}\x{1d4ae}-\x{1d4b9}\x{1d4bb}\x{1d4bd}-\x{1d4c3}\x{1d4c5}-\x{1d505}\x{1d507}-\x{1d50a}\x{1d50d}-\x{1d514}\x{1d516}-\x{1d51c}\x{1d51e}-\x{1d539}\x{1d53b}-\x{1d53e}\x{1d540}-\x{1d544}\x{1d546}\x{1d54a}-\x{1d550}\x{1d552}-\x{1d6a5}\x{1d6a8}-\x{1d7cb}\x{1d7ce}-\x{1d7ff}\x{1ec71}-\x{1ecb4}\x{1ed01}-\x{1ed3d}\x{1f000}-\x{1f02b}\x{1f030}-\x{1f093}\x{1f0a0}-\x{1f0ae}\x{1f0b1}-\x{1f0bf}\x{1f0c1}-\x{1f0cf}\x{1f0d1}-\x{1f0f5}\x{1f100}-\x{1f10c}\x{1f110}-\x{1f16c}\x{1f170}-\x{1f1ac}\x{1f1e6}-\x{1f1ff}\x{1f201}-\x{1f202}\x{1f210}-\x{1f23b}\x{1f240}-\x{1f248}\x{1f250}-\x{1f251}\x{1f260}-\x{1f265}\x{1f300}-\x{1f6d5}\x{1f6e0}-\x{1f6ec}\x{1f6f0}-\x{1f6fa}\x{1f700}-\x{1f773}\x{1f780}-\x{1f7d8}\x{1f7e0}-\x{1f7eb}\x{1f800}-\x{1f80b}\x{1f810}-\x{1f847}\x{1f850}-\x{1f859}\x{1f860}-\x{1f887}\x{1f890}-\x{1f8ad}\x{1f900}-\x{1f90b}\x{1f90d}-\x{1f971}\x{1f973}-\x{1f976}\x{1f97a}-\x{1f9a2}\x{1f9a5}-\x{1f9aa}\x{1f9ae}-\x{1f9ca}\x{1f9cd}-\x{1fa53}\x{1fa60}-\x{1fa6d}\x{1fa70}-\x{1fa73}\x{1fa78}-\x{1fa7a}\x{1fa80}-\x{1fa82}\x{1fa90}-\x{1fa95}\x{e0001}\x{e0020}-\x{e007f}]
Coptic(137)=[\x{03e2}-\x{03ef}\x{2c80}-\x{2cf3}\x{2cf9}-\x{2cff}]
Cuneiform(1234)=[\x{12000}-\x{12399}\x{12400}-\x{1246e}\x{12470}-\x{12474}\x{12480}-\x{12543}]
Cypriot(55)=[\x{10800}-\x{10805}\x{10808}\x{1080a}-\x{10835}\x{10837}-\x{10838}\x{1083c}\x{1083f}]
Cyrillic(443)=[\x{0400}-\x{0484}\x{0487}-\x{052f}\x{1c80}-\x{1c88}\x{1d2b}\x{1d78}\x{2de0}-\x{2dff}\x{a640}-\x{a69f}\x{fe2e}-\x{fe2f}]
Deseret(80)=[\x{10400}-\x{1044f}]
Devanagari(154)=[\x{0900}-\x{0950}\x{0955}-\x{0963}\x{0966}-\x{097f}\x{a8e0}-\x{a8ff}]
Egyptian_Hieroglyphs(1080)=[\x{13000}-\x{1342e}\x{13430}-\x{13438}]
Ethiopic(495)=[\x{1200}-\x{1248}\x{124a}-\x{124d}\x{1250}-\x{1256}\x{1258}\x{125a}-\x{125d}\x{1260}-\x{1288}\x{128a}-\x{128d}\x{1290}-\x{12b0}\x{12b2}-\x{12b5}\x{12b8}-\x{12be}\x{12c0}\x{12c2}-\x{12c5}\x{12c8}-\x{12d6}\x{12d8}-\x{1310}\x{1312}-\x{1315}\x{1318}-\x{135a}\x{135d}-\x{137c}\x{1380}-\x{1399}\x{2d80}-\x{2d96}\x{2da0}-\x{2da6}\x{2da8}-\x{2dae}\x{2db0}-\x{2db6}\x{2db8}-\x{2dbe}\x{2dc0}-\x{2dc6}\x{2dc8}-\x{2dce}\x{2dd0}-\x{2dd6}\x{2dd8}-\x{2dde}\x{ab01}-\x{ab06}\x{ab09}-\x{ab0e}\x{ab11}-\x{ab16}\x{ab20}-\x{ab26}\x{ab28}-\x{ab2e}]
Georgian(173)=[\x{10a0}-\x{10c5}\x{10c7}\x{10cd}\x{10d0}-\x{10fa}\x{10fc}-\x{10ff}\x{1c90}-\x{1cba}\x{1cbd}-\x{1cbf}\x{2d00}-\x{2d25}\x{2d27}\x{2d2d}]
Glagolitic(132)=[\x{2c00}-\x{2c2e}\x{2c30}-\x{2c5e}\x{1e000}-\x{1e006}\x{1e008}-\x{1e018}\x{1e01b}-\x{1e021}\x{1e023}-\x{1e024}\x{1e026}-\x{1e02a}]
Gothic(27)=[\x{10330}-\x{1034a}]
Greek(518)=[\x{0370}-\x{0373}\x{0375}-\x{0377}\x{037a}-\x{037d}\x{037f}\x{0384}\x{0386}\x{0388}-\x{038a}\x{038c}\x{038e}-\x{03a1}\x{03a3}-\x{03e1}\x{03f0}-\x{03ff}\x{1d26}-\x{1d2a}\x{1d5d}-\x{1d61}\x{1d66}-\x{1d6a}\x{1dbf}\x{1f00}-\x{1f15}\x{1f18}-\x{1f1d}\x{1f20}-\x{1f45}\x{1f48}-\x{1f4d}\x{1f50}-\x{1f57}\x{1f59}\x{1f5b}\x{1f5d}\x{1f5f}-\x{1f7d}\x{1f80}-\x{1fb4}\x{1fb6}-\x{1fc4}\x{1fc6}-\x{1fd3}\x{1fd6}-\x{1fdb}\x{1fdd}-\x{1fef}\x{1ff2}-\x{1ff4}\x{1ff6}-\x{1ffe}\x{2126}\x{ab65}\x{10140}-\x{1018e}\x{101a0}\x{1d200}-\x{1d245}]
Gujarati(91)=[\x{0a81}-\x{0a83}\x{0a85}-\x{0a8d}\x{0a8f}-\x{0a91}\x{0a93}-\x{0aa8}\x{0aaa}-\x{0ab0}\x{0ab2}-\x{0ab3}\x{0ab5}-\x{0ab9}\x{0abc}-\x{0ac5}\x{0ac7}-\x{0ac9}\x{0acb}-\x{0acd}\x{0ad0}\x{0ae0}-\x{0ae3}\x{0ae6}-\x{0af1}\x{0af9}-\x{0aff}]
Gurmukhi(80)=[\x{0a01}-\x{0a03}\x{0a05}-\x{0a0a}\x{0a0f}-\x{0a10}\x{0a13}-\x{0a28}\x{0a2a}-\x{0a30}\x{0a32}-\x{0a33}\x{0a35}-\x{0a36}\x{0a38}-\x{0a39}\x{0a3c}\x{0a3e}-\x{0a42}\x{0a47}-\x{0a48}\x{0a4b}-\x{0a4d}\x{0a51}\x{0a59}-\x{0a5c}\x{0a5e}\x{0a66}-\x{0a76}]
Han(89233)=[\x{2e80}-\x{2e99}\x{2e9b}-\x{2ef3}\x{2f00}-\x{2fd5}\x{3005}\x{3007}\x{3021}-\x{3029}\x{3038}-\x{303b}\x{3400}-\x{4db5}\x{4e00}-\x{9fef}\x{f900}-\x{fa6d}\x{fa70}-\x{fad9}\x{20000}-\x{2a6d6}\x{2a700}-\x{2b734}\x{2b740}-\x{2b81d}\x{2b820}-\x{2cea1}\x{2ceb0}-\x{2ebe0}\x{2f800}-\x{2fa1d}]
Hangul(11739)=[\x{1100}-\x{11ff}\x{302e}-\x{302f}\x{3131}-\x{318e}\x{3200}-\x{321e}\x{3260}-\x{327e}\x{a960}-\x{a97c}\x{ac00}-\x{d7a3}\x{d7b0}-\x{d7c6}\x{d7cb}-\x{d7fb}\x{ffa0}-\x{ffbe}\x{ffc2}-\x{ffc7}\x{ffca}-\x{ffcf}\x{ffd2}-\x{ffd7}\x{ffda}-\x{ffdc}]
Hanunoo(21)=[\x{1720}-\x{1734}]
Hebrew(134)=[\x{0591}-\x{05c7}\x{05d0}-\x{05ea}\x{05ef}-\x{05f4}\x{fb1d}-\x{fb36}\x{fb38}-\x{fb3c}\x{fb3e}\x{fb40}-\x{fb41}\x{fb43}-\x{fb44}\x{fb46}-\x{fb4f}]
Hiragana(379)=[\x{3041}-\x{3096}\x{309d}-\x{309f}\x{1b001}-\x{1b11e}\x{1b150}-\x{1b152}\x{1f200}]
Imperial_Aramaic(31)=[\x{10840}-\x{10855}\x{10857}-\x{1085f}]
Inherited(571)=[\x{0300}-\x{036f}\x{0485}-\x{0486}\x{064b}-\x{0655}\x{0670}\x{0951}-\x{0954}\x{1ab0}-\x{1abe}\x{1cd0}-\x{1cd2}\x{1cd4}-\x{1ce0}\x{1ce2}-\x{1ce8}\x{1ced}\x{1cf4}\x{1cf8}-\x{1cf9}\x{1dc0}-\x{1df9}\x{1dfb}-\x{1dff}\x{200c}-\x{200d}\x{20d0}-\x{20f0}\x{302a}-\x{302d}\x{3099}-\x{309a}\x{fe00}-\x{fe0f}\x{fe20}-\x{fe2d}\x{101fd}\x{102e0}\x{1133b}\x{1d167}-\x{1d169}\x{1d17b}-\x{1d182}\x{1d185}-\x{1d18b}\x{1d1aa}-\x{1d1ad}\x{e0100}-\x{e01ef}]
Inscriptional_Pahlavi(27)=[\x{10b60}-\x{10b72}\x{10b78}-\x{10b7f}]
Inscriptional_Parthian(30)=[\x{10b40}-\x{10b55}\x{10b58}-\x{10b5f}]
Javanese(90)=[\x{a980}-\x{a9cd}\x{a9d0}-\x{a9d9}\x{a9de}-\x{a9df}]
Kaithi(67)=[\x{11080}-\x{110c1}\x{110cd}]
Kannada(89)=[\x{0c80}-\x{0c8c}\x{0c8e}-\x{0c90}\x{0c92}-\x{0ca8}\x{0caa}-\x{0cb3}\x{0cb5}-\x{0cb9}\x{0cbc}-\x{0cc4}\x{0cc6}-\x{0cc8}\x{0cca}-\x{0ccd}\x{0cd5}-\x{0cd6}\x{0cde}\x{0ce0}-\x{0ce3}\x{0ce6}-\x{0cef}\x{0cf1}-\x{0cf2}]
Katakana(304)=[\x{30a1}-\x{30fa}\x{30fd}-\x{30ff}\x{31f0}-\x{31ff}\x{32d0}-\x{32fe}\x{3300}-\x{3357}\x{ff66}-\x{ff6f}\x{ff71}-\x{ff9d}\x{1b000}\x{1b164}-\x{1b167}]
Kayah_Li(47)=[\x{a900}-\x{a92d}\x{a92f}]
Kharoshthi(68)=[\x{10a00}-\x{10a03}\x{10a05}-\x{10a06}\x{10a0c}-\x{10a13}\x{10a15}-\x{10a17}\x{10a19}-\x{10a35}\x{10a38}-\x{10a3a}\x{10a3f}-\x{10a48}\x{10a50}-\x{10a58}]
Khmer(146)=[\x{1780}-\x{17dd}\x{17e0}-\x{17e9}\x{17f0}-\x{17f9}\x{19e0}-\x{19ff}]
Lao(82)=[\x{0e81}-\x{0e82}\x{0e84}\x{0e86}-\x{0e8a}\x{0e8c}-\x{0ea3}\x{0ea5}\x{0ea7}-\x{0ebd}\x{0ec0}-\x{0ec4}\x{0ec6}\x{0ec8}-\x{0ecd}\x{0ed0}-\x{0ed9}\x{0edc}-\x{0edf}]
Latin(1366)=[\x{0041}-\x{005a}\x{0061}-\x{007a}\x{00aa}\x{00ba}\x{00c0}-\x{00d6}\x{00d8}-\x{00f6}\x{00f8}-\x{02b8}\x{02e0}-\x{02e4}\x{1d00}-\x{1d25}\x{1d2c}-\x{1d5c}\x{1d62}-\x{1d65}\x{1d6b}-\x{1d77}\x{1d79}-\x{1dbe}\x{1e00}-\x{1eff}\x{2071}\x{207f}\x{2090}-\x{209c}\x{212a}-\x{212b}\x{2132}\x{214e}\x{2160}-\x{2188}\x{2c60}-\x{2c7f}\x{a722}-\x{a787}\x{a78b}-\x{a7bf}\x{a7c2}-\x{a7c6}\x{a7f7}-\x{a7ff}\x{ab30}-\x{ab5a}\x{ab5c}-\x{ab64}\x{ab66}-\x{ab67}\x{fb00}-\x{fb06}\x{ff21}-\x{ff3a}\x{ff41}-\x{ff5a}]
Lepcha(74)=[\x{1c00}-\x{1c37}\x{1c3b}-\x{1c49}\x{1c4d}-\x{1c4f}]
Limbu(68)=[\x{1900}-\x{191e}\x{1920}-\x{192b}\x{1930}-\x{193b}\x{1940}\x{1944}-\x{194f}]
Linear_B(211)=[\x{10000}-\x{1000b}\x{1000d}-\x{10026}\x{10028}-\x{1003a}\x{1003c}-\x{1003d}\x{1003f}-\x{1004d}\x{10050}-\x{1005d}\x{10080}-\x{100fa}]
Lisu(48)=[\x{a4d0}-\x{a4ff}]
Lycian(29)=[\x{10280}-\x{1029c}]
Lydian(27)=[\x{10920}-\x{10939}\x{1093f}]
Malayalam(117)=[\x{0d00}-\x{0d03}\x{0d05}-\x{0d0c}\x{0d0e}-\x{0d10}\x{0d12}-\x{0d44}\x{0d46}-\x{0d48}\x{0d4a}-\x{0d4f}\x{0d54}-\x{0d63}\x{0d66}-\x{0d7f}]
Mandaic(29)=[\x{0840}-\x{085b}\x{085e}]
Meetei_Mayek(79)=[\x{aae0}-\x{aaf6}\x{abc0}-\x{abed}\x{abf0}-\x{abf9}]
Meroitic_Cursive(90)=[\x{109a0}-\x{109b7}\x{109bc}-\x{109cf}\x{109d2}-\x{109ff}]
Meroitic_Hieroglyphs(32)=[\x{10980}-\x{1099f}]
Miao(149)=[\x{16f00}-\x{16f4a}\x{16f4f}-\x{16f87}\x{16f8f}-\x{16f9f}]
Mongolian(167)=[\x{1800}-\x{1801}\x{1804}\x{1806}-\x{180e}\x{1810}-\x{1819}\x{1820}-\x{1878}\x{1880}-\x{18aa}\x{11660}-\x{1166c}]
Myanmar(223)=[\x{1000}-\x{109f}\x{a9e0}-\x{a9fe}\x{aa60}-\x{aa7f}]
New_Tai_Lue(83)=[\x{1980}-\x{19ab}\x{19b0}-\x{19c9}\x{19d0}-\x{19da}\x{19de}-\x{19df}]
Nko(62)=[\x{07c0}-\x{07fa}\x{07fd}-\x{07ff}]
Ogham(29)=[\x{1680}-\x{169c}]
Old_Italic(39)=[\x{10300}-\x{10323}\x{1032d}-\x{1032f}]
Old_Persian(50)=[\x{103a0}-\x{103c3}\x{103c8}-\x{103d5}]
Old_South_Arabian(32)=[\x{10a60}-\x{10a7f}]
Old_Turkic(73)=[\x{10c00}-\x{10c48}]
Ol_Chiki(48)=[\x{1c50}-\x{1c7f}]
Oriya(90)=[\x{0b01}-\x{0b03}\x{0b05}-\x{0b0c}\x{0b0f}-\x{0b10}\x{0b13}-\x{0b28}\x{0b2a}-\x{0b30}\x{0b32}-\x{0b33}\x{0b35}-\x{0b39}\x{0b3c}-\x{0b44}\x{0b47}-\x{0b48}\x{0b4b}-\x{0b4d}\x{0b56}-\x{0b57}\x{0b5c}-\x{0b5d}\x{0b5f}-\x{0b63}\x{0b66}-\x{0b77}]
Osmanya(40)=[\x{10480}-\x{1049d}\x{104a0}-\x{104a9}]
Phags_Pa(56)=[\x{a840}-\x{a877}]
Phoenician(29)=[\x{10900}-\x{1091b}\x{1091f}]
Rejang(37)=[\x{a930}-\x{a953}\x{a95f}]
Runic(86)=[\x{16a0}-\x{16ea}\x{16ee}-\x{16f8}]
Samaritan(61)=[\x{0800}-\x{082d}\x{0830}-\x{083e}]
Saurashtra(82)=[\x{a880}-\x{a8c5}\x{a8ce}-\x{a8d9}]
Sharada(94)=[\x{11180}-\x{111cd}\x{111d0}-\x{111df}]
Shavian(48)=[\x{10450}-\x{1047f}]
Sinhala(110)=[\x{0d82}-\x{0d83}\x{0d85}-\x{0d96}\x{0d9a}-\x{0db1}\x{0db3}-\x{0dbb}\x{0dbd}\x{0dc0}-\x{0dc6}\x{0dca}\x{0dcf}-\x{0dd4}\x{0dd6}\x{0dd8}-\x{0ddf}\x{0de6}-\x{0def}\x{0df2}-\x{0df4}\x{111e1}-\x{111f4}]
Sora_Sompeng(35)=[\x{110d0}-\x{110e8}\x{110f0}-\x{110f9}]
Sundanese(72)=[\x{1b80}-\x{1bbf}\x{1cc0}-\x{1cc7}]
Syloti_Nagri(44)=[\x{a800}-\x{a82b}]
Syriac(88)=[\x{0700}-\x{070d}\x{070f}-\x{074a}\x{074d}-\x{074f}\x{0860}-\x{086a}]
Tagalog(20)=[\x{1700}-\x{170c}\x{170e}-\x{1714}]
Tagbanwa(18)=[\x{1760}-\x{176c}\x{176e}-\x{1770}\x{1772}-\x{1773}]
Tai_Le(35)=[\x{1950}-\x{196d}\x{1970}-\x{1974}]
Tai_Tham(127)=[\x{1a20}-\x{1a5e}\x{1a60}-\x{1a7c}\x{1a7f}-\x{1a89}\x{1a90}-\x{1a99}\x{1aa0}-\x{1aad}]
Tai_Viet(72)=[\x{aa80}-\x{aac2}\x{aadb}-\x{aadf}]
Takri(67)=[\x{11680}-\x{116b8}\x{116c0}-\x{116c9}]
Tamil(123)=[\x{0b82}-\x{0b83}\x{0b85}-\x{0b8a}\x{0b8e}-\x{0b90}\x{0b92}-\x{0b95}\x{0b99}-\x{0b9a}\x{0b9c}\x{0b9e}-\x{0b9f}\x{0ba3}-\x{0ba4}\x{0ba8}-\x{0baa}\x{0bae}-\x{0bb9}\x{0bbe}-\x{0bc2}\x{0bc6}-\x{0bc8}\x{0bca}-\x{0bcd}\x{0bd0}\x{0bd7}\x{0be6}-\x{0bfa}\x{11fc0}-\x{11ff1}\x{11fff}]
Telugu(98)=[\x{0c00}-\x{0c0c}\x{0c0e}-\x{0c10}\x{0c12}-\x{0c28}\x{0c2a}-\x{0c39}\x{0c3d}-\x{0c44}\x{0c46}-\x{0c48}\x{0c4a}-\x{0c4d}\x{0c55}-\x{0c56}\x{0c58}-\x{0c5a}\x{0c60}-\x{0c63}\x{0c66}-\x{0c6f}\x{0c77}-\x{0c7f}]
Thaana(50)=[\x{0780}-\x{07b1}]
Thai(86)=[\x{0e01}-\x{0e3a}\x{0e40}-\x{0e5b}]
Tibetan(207)=[\x{0f00}-\x{0f47}\x{0f49}-\x{0f6c}\x{0f71}-\x{0f97}\x{0f99}-\x{0fbc}\x{0fbe}-\x{0fcc}\x{0fce}-\x{0fd4}\x{0fd9}-\x{0fda}]
Tifinagh(59)=[\x{2d30}-\x{2d67}\x{2d6f}-\x{2d70}\x{2d7f}]
Ugaritic(31)=[\x{10380}-\x{1039d}\x{1039f}]
Vai(300)=[\x{a500}-\x{a62b}]
Yi(1220)=[\x{a000}-\x{a48c}\x{a490}-\x{a4c6}]

I've only visually compared the "Inherited" script definition with the Unicode one (https://unicode.org/Public/UCD/latest/ucd/Scripts.txt), but there are discrepancies in that, e.g. 0x1ABF..0x1ACE is covered by the Unicode definition, but not in the PHP implementation I'm using (7.4.3)

  • Related