Passing a function through list of columns with numpy.vectorize or DataFrame.apply?-CodePudding

I've got the following data frame

df = pd.DataFrame(data= {'Product_JP': ['ﾄﾏﾄｺ- ｻﾙｻ C225G','ﾏﾄｹﾁﾔﾂﾌﾟ','ﾄﾏﾄｹﾁﾔﾂﾌﾟﾊﾞﾘﾕ-','ｹﾁﾔﾂﾌﾟﾊ-ﾌ','ﾄﾏﾄｹﾁﾔﾂﾌﾟﾌﾟﾚﾐｱﾑ'],
                  'Value1': [1,12313,1.123,0.112,0],
                  'Metric1_JP': ['ﾏ-ｹｯﾄｻｲｽﾞ(販売金額(x1000))','加重販売率(販売金額)','ｱｲﾃﾑ販売店当り(販売個数)','加重販売率(販売金額)','加重販売率(販売金額)'],
                  'Type_JP': ['サルサソ−ス','ケチャップ','ケチャップ','ケチャップ','ケチャップ'],
                  'SKU': [4582152498325,4582112498325,4500152498325,4582112398325,4582152483125]},
                 )


        Product_JP     Value1              Metric1_JP Type_JP            SKU
0  ﾄﾏﾄｺ- ｻﾙｻ C225G      1.000  ﾏ-ｹｯﾄｻｲｽﾞ(販売金額(x1000))  サルサソ−ス  4582152498325
1         ﾏﾄｹﾁﾔﾂﾌﾟ  12313.000             加重販売率(販売金額)   ケチャップ  4582112498325
2   ﾄﾏﾄｹﾁﾔﾂﾌﾟﾊﾞﾘﾕ-      1.123         ｱｲﾃﾑ販売店当り(販売個数)   ケチャップ  4500152498325
3        ｹﾁﾔﾂﾌﾟﾊ-ﾌ      0.112             加重販売率(販売金額)   ケチャップ  4582112398325
4  ﾄﾏﾄｹﾁﾔﾂﾌﾟﾌﾟﾚﾐｱﾑ      0.000             加重販売率(販売金額)   ケチャップ  4582152483125

And I can apply the following function using df.apply()

from deep_translator import (GoogleTranslator)
df['Product_EN'] = df['Product_JP'].apply(lambda row:GoogleTranslator(source='ja', target='en').translate(row))

        Product_JP     Value1              Metric1_JP Type_JP            SKU  \
0  ﾄﾏﾄｺ- ｻﾙｻ C225G      1.000  ﾏ-ｹｯﾄｻｲｽﾞ(販売金額(x1000))  サルサソ−ス  4582152498325   
1         ﾏﾄｹﾁﾔﾂﾌﾟ  12313.000             加重販売率(販売金額)   ケチャップ  4582112498325   
2   ﾄﾏﾄｹﾁﾔﾂﾌﾟﾊﾞﾘﾕ-      1.123         ｱｲﾃﾑ販売店当り(販売個数)   ケチャップ  4500152498325   
3        ｹﾁﾔﾂﾌﾟﾊ-ﾌ      0.112             加重販売率(販売金額)   ケチャップ  4582112398325   
4  ﾄﾏﾄｹﾁﾔﾂﾌﾟﾌﾟﾚﾐｱﾑ      0.000             加重販売率(販売金額)   ケチャップ  4582152483125   

             Product_EN  
0  Tomatoco-Salsa C225G  
1               Matthew  
2          Tomato miser  
3                 Catch  
4  Tomato miser premium

But what I want to do is to pass a list of columns to apply in one go like so

JP_columns = [column for column in df.columns if '_JP' in column]
EN_columns = [column.replace('_JP', '_EN') for column in JP_columns]

df[EN_columns] = df[JP_columns].apply(lambda row:GoogleTranslator(source='ja', target='en').translate(row))

This returns a ValueError: "The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()."

What am I doing wrong with df.apply()
Would this be better done using np.vectorize?

for example (Also returns a Value Error: "The truth value of a DataFrame is ambiguous")

df[EN_columns] = np.vectorize(GoogleTranslator(source='ja', target='en').translate(df[JP_columns]))

Thanks

CodePudding user response：

Series.apply applies the function to each cell (row) in the Series since there is a single dimension. However, DataFrame.apply passes the entire column to the function by default. However, translate expects text not a collection.

The function to apply a function to each cell in a DataFrame is applymap and can be used as such:

JP_columns = [column for column in df.columns if '_JP' in column]
EN_columns = [column.replace('_JP', '_EN') for column in JP_columns]

# apply to all cells in the DataFrame
df[EN_columns] = df[JP_columns].applymap(
    GoogleTranslator(source='ja', target='en').translate
)

np.vectorize can also work, note it takes a pyfunc as input in this case translate and returns a callable:

JP_columns = [column for column in df.columns if '_JP' in column]
EN_columns = [column.replace('_JP', '_EN') for column in JP_columns]

# vectorize function then call function on DataFrame
df[EN_columns] = np.vectorize(
    GoogleTranslator(source='ja', target='en').translate
)(df[JP_columns])

Either approach results in df:

Product_JP	Value1	Metric1_JP	Type_JP	SKU	Product_EN	Metric1_EN	Type_EN
ﾄﾏﾄｺ- ｻﾙｻ C225G	1	ﾏ-ｹｯﾄｻｲｽﾞ(販売金額(x1000))	サルサソ−ス	4582152498325	Tomatoco-Salsa C225G	Market size (sales amount (x1000))	Salsa source
ﾏﾄｹﾁﾔﾂﾌﾟ	12313	加重販売率(販売金額)	ケチャップ	4582112498325	Matthew	Weighted sales rate (sales amount)	ketchup
ﾄﾏﾄｹﾁﾔﾂﾌﾟﾊﾞﾘﾕ-	1.123	ｱｲﾃﾑ販売店当り(販売個数)	ケチャップ	4500152498325	Tomato miser	Per item store (number of units sold)	ketchup
ｹﾁﾔﾂﾌﾟﾊ-ﾌ	0.112	加重販売率(販売金額)	ケチャップ	4582112398325	Catch	Weighted sales rate (sales amount)	ketchup
ﾄﾏﾄｹﾁﾔﾂﾌﾟﾌﾟﾚﾐｱﾑ	0	加重販売率(販売金額)	ケチャップ	4582152483125	Tomato miser premium	Weighted sales rate (sales amount)	ketchup

Setup and imports:

import numpy as np  # only for np.vectorize
import pandas as pd
from deep_translator import GoogleTranslator

df = pd.DataFrame({
    'Product_JP': ['ﾄﾏﾄｺ- ｻﾙｻ C225G', 'ﾏﾄｹﾁﾔﾂﾌﾟ', 'ﾄﾏﾄｹﾁﾔﾂﾌﾟﾊﾞﾘﾕ-', 'ｹﾁﾔﾂﾌﾟﾊ-ﾌ',
                   'ﾄﾏﾄｹﾁﾔﾂﾌﾟﾌﾟﾚﾐｱﾑ'],
    'Value1': [1, 12313, 1.123, 0.112, 0],
    'Metric1_JP': ['ﾏ-ｹｯﾄｻｲｽﾞ(販売金額(x1000))', '加重販売率(販売金額)',
                   'ｱｲﾃﾑ販売店当り(販売個数)', '加重販売率(販売金額)',
                   '加重販売率(販売金額)'],
    'Type_JP': ['サルサソ−ス', 'ケチャップ', 'ケチャップ', 'ケチャップ', 'ケチャップ'],
    'SKU': [4582152498325, 4582112498325, 4500152498325, 4582112398325,
            4582152483125]
})