Home > Software design >  Hebrew dictionary remove all non - hebrew characters
Hebrew dictionary remove all non - hebrew characters

Time:05-29

I want to create a Hebrew dictionary

and patram after something that removes all things that are not Hebrew characters

from unidecode import unidecode
import random
import re

random = (random.randint(1000, 2000))

n = (input("HebrewFileName?:"))

with open("" str(n) ".txt", encoding='utf-8') as fname:
    text = fname.read()
    res = re.sub('[!,*)@|#%(&$_?.^]', '', text)
    lst = list(set(res.split()))
    str1 = ' '.join(str(e) for e in lst)
    lines = str1.split(' ')
    lines1 = list(filter(lambda w: not re.match(r'[a-zA-Z] ', w), lines))
    text1 = ("\n".join(lines1))
    text2 = ''.join(filter(lambda x: not x.isdigit(), text1))

    print(text2, file=open("" str(random) "-.txt", "a", encoding='utf-8'))
    print("done")

how could i do that? please give an example in the code

for example this

test = "כַּחֲצִי" if it is Hebrew to write to the file

if there are not all Hebrew characters not to add

example input text test = "כַּחֲצִי" output is same כַּחֲצִי

if have non hebrew words delete test = "כַּtestחֲצִי" this delete output is "" none

 alphabet = {   "א","אִ","ב","בּ","ג","ד","ה","ם","ו","וּ","ן","ז","ח","חָ","ט","י","כ","ָך","ל","מ","נ","ס","ע","פ","ף","צ","ץ","ק","ר","ש","ת"} I'm basically looking for something that removes all the characters except these and leaves spaces

CodePudding user response:

alphabet = " אאִבבּגדהםווּןזחחָטיכָךלמנסעפףצץקרשת" 
def letters_only(source):
    result = ""
    for i in source.lower():
        if i in alphabet:
            result  = i
    return result


with open("" str("random") ".txt", encoding='utf-8') as fname:
    text = fname.read()
    test=(letters_only(text))
    print(test, file=open("" str("random") "-.txt", "a", encoding='utf-8'))
  • Related