I want to create a Hebrew dictionary
and patram after something that removes all things that are not Hebrew characters
from unidecode import unidecode
import random
import re
random = (random.randint(1000, 2000))
n = (input("HebrewFileName?:"))
with open("" str(n) ".txt", encoding='utf-8') as fname:
text = fname.read()
res = re.sub('[!,*)@|#%(&$_?.^]', '', text)
lst = list(set(res.split()))
str1 = ' '.join(str(e) for e in lst)
lines = str1.split(' ')
lines1 = list(filter(lambda w: not re.match(r'[a-zA-Z] ', w), lines))
text1 = ("\n".join(lines1))
text2 = ''.join(filter(lambda x: not x.isdigit(), text1))
print(text2, file=open("" str(random) "-.txt", "a", encoding='utf-8'))
print("done")
how could i do that? please give an example in the code
for example this
test = "כַּחֲצִי" if it is Hebrew to write to the file
if there are not all Hebrew characters not to add
example input text test = "כַּחֲצִי" output is same כַּחֲצִי
if have non hebrew words delete test = "כַּtestחֲצִי" this delete output is "" none
alphabet = { "א","אִ","ב","בּ","ג","ד","ה","ם","ו","וּ","ן","ז","ח","חָ","ט","י","כ","ָך","ל","מ","נ","ס","ע","פ","ף","צ","ץ","ק","ר","ש","ת"} I'm basically looking for something that removes all the characters except these and leaves spaces
CodePudding user response:
alphabet = " אאִבבּגדהםווּןזחחָטיכָךלמנסעפףצץקרשת"
def letters_only(source):
result = ""
for i in source.lower():
if i in alphabet:
result = i
return result
with open("" str("random") ".txt", encoding='utf-8') as fname:
text = fname.read()
test=(letters_only(text))
print(test, file=open("" str("random") "-.txt", "a", encoding='utf-8'))