Home > Software engineering >  How to remove emoji from a dataset python?
How to remove emoji from a dataset python?

Time:11-03

I have the following dataset in Russian. I need to remove emoticons from it. I've already found a couple of examples, but it completely breaks my text, because all examples are used for examples in English. Help me figure it out.

enter image description here

CodePudding user response:

You can use the emoji package here and use that with regex like this

import emoji
emoji_regex = emoji.get_emoji_regexp()
title = emoji_regex.sub("",df["title"])
text = emoji_regex.sub("",df["text"])

CodePudding user response:

It worked for me and Russian

filter_char = lambda x: ord(x) < 2048
Data_preprocessing['tweet_without_stopwords'] = Data_preprocessing['tweet_without_stopwords'].apply(lambda x: ''.join(filter(filter_char, x)))
  • Related