There is ori_string
,how to using regexp
to remove all the character not in chinese and english? Thanks!
ori_string<-"没a w t _ 中/国.sz"
the wished result is
"没awt中国sz"
CodePudding user response:
I have coded it in python, as you didn't specify anything. The idea is here.
def remove_non_english_chinese(text):
# Use a regex pattern to match any character that is not a letter or number
pattern = r'[^a-zA-Z0-9\u4e00-\u9fff]'
# Replace all non-English and non-Chinese characters with an empty string
return re.sub(pattern, '', text)
CodePudding user response:
Seems you want to remove punctuation and spaces:
> regex <- '[[:punct:][:space:]] '
> gsub(regex, '', ori_string)
[1] "没awt中国sz"