How do I apply a function to every element in a list in every row of a dataframe?
df:
label top_topics
adverts ['werbung', 'geschenke']
my function looks something like this:
from langdetect import detect
from googletrans import Translator
def detect_and_translate(text):
target_lang = 'en'
try:
result_lang = detect(text)
except:
result_lang = target_lang
if result_lang == target_lang:
return text, result_lang
else:
translator = Translator()
translated_text = translator.translate(text, dest=target_lang)
return translated_text.text, result_lang
expecting an output like :
label top_topics translation language
adverts ['werbung', 'geschenke'] ['advertising', 'gifts'] de
I tried something like this but didn't translate the column top_topics
as it couldn't loop through every element in the list.
df['translate_detect'] = df['top_topics'].apply(detect_and_translate)
df['top_topics_en'], df['language'] = df.translate_detect.str
Any help?
CodePudding user response:
First, you should never use a bare except
.
Second, because your function translates a single word and returns the translated word and the detected language as a tuple, it would be difficult and tedious to achieve your desired output of a list of translated words and a single detected language. Instead, modify your function to do so:
import googletrans
def detect_and_translate(lst):
translator = Translator()
target_lang = 'en'
try:
result_lang = translator.detect(lst[0])
except Exception: # should be the specific exception that can occur
return lst, result_lang
translations = []
for text in lst:
translated_text = translator.translate(text, dest=target_lang)
translations.append(translated_text.text)
return translations, result_lang
Usage:
In [4]: googletrans.__version__
Out[4]: '4.0.0-rc.1'
In [5]: df[["topics_en", "language"]] = df.top_topics.apply(detect_and_translate).apply(pd.Series)
In [6]: df
Out[6]:
label top_topics topics_en language
0 adverts [werbung, geschenke] [advertising, gifts] Detected(lang=de, confidence=None)
Note that googletrans.Translator
has a language detection method. It doesn't work in 3.0.0
but if you pip install googletrans==4.0.0rc1
it will.
Note also that in order for this to work, you must assume that all words in a given list are the same language. If that's not an assumption you can make, you'll need to figure something else out.