How do I copy contents from one column to another while using .apply() in pandas?-CodePudding

I have a DataFrame with 2 columns total_open_amount and invoice_currency.

invoice_currency has

USD    45011
CAD     3828
Name: invoice_currency, dtype: int64

And I want to convert all the CAD to USD from the total_open_amount column wrt to invoice_currency with an exchange rate of 1 CAD = 0.7USD and store them in a separate column.

My code:

df_data['converted_usd'] = df_data['total_open_amount'].where(df_data['invoice_currency']=='CAD')
df_data['converted_usd']= df_data['converted_usd'].apply(lambda x: x*0.7)
df_data['converted_usd']

output:

0            NaN
1            NaN
2            NaN
3        2309.79
4            NaN
          ...   
49995        NaN
49996        NaN
49997        NaN
49998        NaN
49999        NaN
Name: converted_usd, Length: 48839, dtype: float64

I was able to fill the new column with CAD values converted but how do I fill the rest of the USD values now?

CodePudding user response：

We can use Series.mask or Series.where, series.mask set to NaN the rows where 'invoice_currency' is USD, but with the other parameter we tell it that these values have to be filled with df_data['total_open_amount'] series multiplied by 0.7.

using serie.where the rows that do not meet the condition are set to NaN, so first we multiply the series by 0.7 and leave only the rows where the condition is met, that is, the rows with USD currency and we use other parameter to leave the rest of rows with initial value

Note that series.mask and series.where are the opposite of each other.

df_data['converted_usd'] = df_data['total_open_amount']\
    .mask(df_data['invoice_currency'] == 'CAD', 
          other=df_data['total_open_amount'].mul(0.7))

Or:

df_data['converted_usd'] = df_data['total_open_amount'].mul(0.7)\
    .where(df_data['invoice_currency'] == 'CAD', 
          df_data['total_open_amount'])

numpy version

df_data['converted_usd'] = \
np.where(df_data['invoice_currency'] == 'CAD',               
         df_data['total_open_amount'].mul(0.7), 
         df_data['total_open_amount'])