I have the following dataset in Russian. I need to remove emoticons from it. I've already found a couple of examples, but it completely breaks my text, because all examples are used for examples in English. Help me figure it out.
CodePudding user response:
You can use the emoji
package here and use that with regex like this
import emoji
emoji_regex = emoji.get_emoji_regexp()
title = emoji_regex.sub("",df["title"])
text = emoji_regex.sub("",df["text"])
CodePudding user response:
It worked for me and Russian
filter_char = lambda x: ord(x) < 2048
Data_preprocessing['tweet_without_stopwords'] = Data_preprocessing['tweet_without_stopwords'].apply(lambda x: ''.join(filter(filter_char, x)))