Home > front end >  Apply function to every element in a list in a df column
Apply function to every element in a list in a df column

Time:05-18

How do I apply a function to every element in a list in every row of a dataframe?

df:

label   top_topics               
adverts ['werbung', 'geschenke']

my function looks something like this:

from langdetect import detect
from googletrans import Translator

def detect_and_translate(text):
    
    target_lang = 'en'
    try:
        result_lang = detect(text)
        
    except:
        result_lang = target_lang
    
    if result_lang == target_lang:
        
        return text, result_lang
    
    else:
        translator = Translator()
        translated_text = translator.translate(text, dest=target_lang)
        return translated_text.text, result_lang

expecting an output like :

 label        top_topics                 translation             language

 adverts    ['werbung', 'geschenke']       ['advertising', 'gifts']   de

I tried something like this but didn't translate the column top_topics as it couldn't loop through every element in the list.

df['translate_detect'] = df['top_topics'].apply(detect_and_translate)
df['top_topics_en'], df['language'] = df.translate_detect.str

Any help?

CodePudding user response:

First, you should never use a bare except.

Second, because your function translates a single word and returns the translated word and the detected language as a tuple, it would be difficult and tedious to achieve your desired output of a list of translated words and a single detected language. Instead, modify your function to do so:

import googletrans


def detect_and_translate(lst):
    translator = Translator()
    target_lang = 'en'
    try:
        result_lang = translator.detect(lst[0])
    except Exception:  # should be the specific exception that can occur
        return lst, result_lang

    translations = []
    for text in lst:
        translated_text = translator.translate(text, dest=target_lang)
        translations.append(translated_text.text)

    return translations, result_lang

Usage:

In [4]: googletrans.__version__
Out[4]: '4.0.0-rc.1'

In [5]: df[["topics_en", "language"]] = df.top_topics.apply(detect_and_translate).apply(pd.Series)

In [6]: df
Out[6]:
     label            top_topics             topics_en                            language
0  adverts  [werbung, geschenke]  [advertising, gifts]  Detected(lang=de, confidence=None)

Note that googletrans.Translator has a language detection method. It doesn't work in 3.0.0 but if you pip install googletrans==4.0.0rc1 it will.

Note also that in order for this to work, you must assume that all words in a given list are the same language. If that's not an assumption you can make, you'll need to figure something else out.

  • Related