Home > Back-end >  Check if csv contains Chinese characters in python then output
Check if csv contains Chinese characters in python then output

Time:06-26

I have a csv file that contains English and Chinese, how can I separate them and then save the ones that contain Chinese as "Chinese" and those that don't contain Chinese as "English", I found a code to differentiate but I don't know how to save them.


def is_chinese(string):
    for ch in string:
        if u'\u4e00' <= ch <= u'\u9fff':
            return True

    return False

ret1 = is_chinese("a中国aaa")
print(ret1)

ret2 = is_chinese("123")
print(ret2)

csv file

"sex","name","age"
"1","hali","18"
"2","张三","24"
"1","云lee","20"

I want to classify it like this: Eeglish

"sex","name","age"
"1","hali","18"

Chinese:

"sex","name","age"
"2","张三","24"
"1","云lee","20"

CodePudding user response:

This code output the lines that contains the chinese character and save those into a file called "detected.txt"

import re

characters=[]
i = 0
with open('01.csv','r',encoding='utf-8') as file: #Open CSV file
    with open('detected.txt', 'r ') as f: #Open file to write

        for line in file.readlines(): #Read each line of CSV file
            if re.findall(r'[\u4e00-\u9fff] ', line) == []: #If there is no Chinese character in the line
                pass
            else:
                characters.append(re.findall(r'[\u4e00-\u9fff] ', line)) #Append the Chinese character to the list
                if str(characters[i][0]) in line: #If the Chinese character is in the line
                    f.write(line) #Append the line to the file
                i =1
    
  • Related