Pandas apply lambda to a function based on condition-CodePudding

I have a data frame of rental data and would like to annualise the rent based on whether a column containing the frequency states that the rent is monthly, i.e. price * 12

The frequency column contains the following values - 'Yearly', 'Monthly', nan

I have tried - np.where(df['frequency'] == "Monthly", df['price'].apply(lambda x: x*12), 0)

However, where there is monthly data, the figure seems to be being copied 12 times rather than multiplied by 12:

And I need to have the price multiplied by 12 but can't figure out how to do this

CodePudding user response：

The problem is your price column contains string and not numeric values. If you load your dataframe from a file (csv, xlsx), use thousands=',' as parameter of pd.read_csv or pd.read_excel to interpret string like '4,500 as the number 4500.

Demo:

import pandas as pd
import io

csvdata = """\
frequency;price
Monthly;4,500
Yearly;30,200
"""

df1 = pd.read_csv(io.StringIO(csvdata), sep=';')
df2 = pd.read_csv(io.StringIO(csvdata), sep=';', thousands=',')

For df1:

>>> df1
  frequency   price
0   Monthly   4,500
1    Yearly  30,200

>>> df1.dtypes
frequency    object
price        object  # not numeric
dtype: object

>>> df1['price'] * 2
0      4,5004,500
1    30,20030,200
Name: price, dtype: object

For df2:

>>> df2
  frequency  price
0   Monthly   4500
1    Yearly  30200

>>> df2.dtypes
frequency    object
price         int64  # numeric
dtype: object

>>> df2['price'] * 2
0     9000
1    60400
Name: price, dtype: int64

CodePudding user response：

It seems there are strings instead numbers floats in column price, so first replace , to . and then convert to floats, last multiple by 12:

np.where(df['frequency'] == "Monthly", df['price'].str.replace(',','.').astype(float)*12, 0)

If values are thousands separated by , replace by empty string:

np.where(df['frequency'] == "Monthly", df['price'].str.replace(',','').astype(float)*12, 0)