I have a csv file that contains English and Chinese, how can I separate them and then save the ones that contain Chinese as "Chinese" and those that don't contain Chinese as "English", I found a code to differentiate but I don't know how to save them.
def is_chinese(string):
for ch in string:
if u'\u4e00' <= ch <= u'\u9fff':
return True
return False
ret1 = is_chinese("a中国aaa")
print(ret1)
ret2 = is_chinese("123")
print(ret2)
csv file
"sex","name","age"
"1","hali","18"
"2","张三","24"
"1","云lee","20"
I want to classify it like this: Eeglish
"sex","name","age"
"1","hali","18"
Chinese:
"sex","name","age"
"2","张三","24"
"1","云lee","20"
CodePudding user response:
This code output the lines that contains the chinese character and save those into a file called "detected.txt"
import re
characters=[]
i = 0
with open('01.csv','r',encoding='utf-8') as file: #Open CSV file
with open('detected.txt', 'r ') as f: #Open file to write
for line in file.readlines(): #Read each line of CSV file
if re.findall(r'[\u4e00-\u9fff] ', line) == []: #If there is no Chinese character in the line
pass
else:
characters.append(re.findall(r'[\u4e00-\u9fff] ', line)) #Append the Chinese character to the list
if str(characters[i][0]) in line: #If the Chinese character is in the line
f.write(line) #Append the line to the file
i =1