Home > Blockchain >  Unicode Ranges of Indian Language Characters
Unicode Ranges of Indian Language Characters

Time:03-22

For the purpose of basic validation, I need the Unicode Code point ranges of most of the common Indian scripts. Please also indicate if there are characters outside those ranges which should be incorporated individually.

CodePudding user response:

The Indian language script ranges are as follows:

  • Devanagari: U 0900 to U 097F
  • Bengali: U 0980 to U 09FF
  • Gurmukhi:U 0A00 to U 0A7F
  • Gujarati:U 0A80 to U 0AFF
  • Odia:U 0B00 to U 0B7F
  • Tamil:U 0B80 to U 0BFF
  • Telugu:U 0C00 to U 0C7F
  • Kannada:U 0C80 to U 0CFF
  • Malayalam:U 0D00 to U 0D7F

In addition to this, following characters are frequently used which are outside these ranges:

  • Zero Width Joiner: U 200C
  • Zero Width Non-joiner: U 200D
  • Indian Rupee Sign: U 20B9

Also please consider all the ASCII punctuations.

  • Related