Home > Net >  When doing translation, how can I skip a row if the text is already in the target language?
When doing translation, how can I skip a row if the text is already in the target language?

Time:12-30

I am using EasyNMT to translate from English to German. Here is my code:

import pandas as pd, warnings
from easynmt import EasyNMT

warnings.filterwarnings('ignore')
model = EasyNMT('opus-mt')

df = pd.read_excel('test.xlsx')

def en_de(x):
    x = model.translate(x, source_lang = 'en', target_lang = 'de')
    return x

df['col_tl'] = df['col1'].apply(en_de)

I want to skip rows that are in German and only translate rows that are in English. Is this possible?

Here's the sample data, where the last row is in German:

col1
The cat sat on the windowsill, gazing out at the birds flying by.
The sun was setting over the ocean, painting the sky with a beautiful array of orange and pink hues.
The young man walked through the park, lost in thought as he listened to his favorite music on his headphones.
The small town was nestled in the rolling hills of the countryside, its quaint streets lined with colorful houses and shops.
The old oak tree stood tall and proud, its branches reaching up to the clear blue sky.
Die Katze saß auf der Fensterbank und schaute auf die vorbeifliegenden Vögel.

CodePudding user response:

You can use the detect_language to find the current language

def en_de(x):
    # Check the language of the input text
    language = model.detect_language(x)
    # If the language is English, translate the text
    if language == 'en':
        x = model.translate(x, source_lang = 'en', target_lang = 'de')
    return x

CodePudding user response:

many package can detect the lang of a text, such as langdetectpackage.

1. install langdetectpackage

pip install langdetect

2. demo code:

import numpy as np
import pandas as pd
from langdetect import detect_langs
from easynmt import EasyNMT

warnings.filterwarnings('ignore')
model = EasyNMT('opus-mt')

samlldata1 = pd.DataFrame({'col1':[
"The cat sat on the windowsill, gazing out at the birds flying by.",
"The sun was setting over the ocean, painting the sky with a beautiful array of orange and pink hues.",
"The young man walked through the park, lost in thought as he listened to his favorite music on his headphones.",
"The small town was nestled in the rolling hills of the countryside, its quaint streets lined with colorful houses and shops.",
"The old oak tree stood tall and proud, its branches reaching up to the clear blue sky.",
"Die Katze saß auf der Fensterbank und schaute auf die vorbeifliegenden Vögel."
]})

samlldata1

# improve your function:
def en_de(x:str) -> str:

    input_lang_type = detect_langs(x)[0].lang # <- detect the lang of text
    if input_lang_type == "de":
        return x 
    else:
        
        x = model.translate(x, source_lang = 'en', target_lang = 'de')
        return x


samlldata1.pipe(
    lambda x: x.assign(**{
        'de_text':x['col1'].apply(lambda j: en_de(j))
    })
)
  • Related