How to remove emoji from a dataset python?-CodePudding

I have the following dataset in Russian. I need to remove emoticons from it. I've already found a couple of examples, but it completely breaks my text, because all examples are used for examples in English. Help me figure it out.

CodePudding user response：

You can use the emoji package here and use that with regex like this

import emoji
emoji_regex = emoji.get_emoji_regexp()
title = emoji_regex.sub("",df["title"])
text = emoji_regex.sub("",df["text"])

CodePudding user response：

It worked for me and Russian

filter_char = lambda x: ord(x) < 2048
Data_preprocessing['tweet_without_stopwords'] = Data_preprocessing['tweet_without_stopwords'].apply(lambda x: ''.join(filter(filter_char, x)))