How to calculate the value of one column based on another column?-CodePudding

I am stuck on how to write a function to calculate the value of one column based on another column.

For example, my dataframe looks like this:

  Name  Total   Bonus
0 Amy   15000   10k
1 Bob   14000   10%
2 Cathy 13400   5.5k
3 David 14800   10% - 20%
4 Emma  15200   8%
5 Fay   13800   0 - 5%
6 Gina  14500   5k - 10k
...

Code for recreating it:

df = pd.DataFrame()
df["Name"] = ["Amy", "Bob", "Cathy", "David", "Emma", "Fay", "Gina"]
df["Total"] = ["15000", "14000", "13400", "14800","15200","13800","14500"]
df["Bonus"] = ["10k", "10%", "5.5k", "10% - 20%", "8%", "0 - 5%", "5k - 10k"]

The dtype of all columns in the dataframe are Object.

What I want is to convert the Bonus column in an organized way:

If it endswith "k", it converts to int(number * 1000)
If it endswith "%", it converts to int(total * number) // 100
If it is a range (like 10% - 20%), go with the lowest(which is 10%), and convert to int(total * number) // 100

The result df should look like this:

  Name  Total   Bonus
0 Amy   15000   10000
1 Bob   14000   1400
2 Cathy 13400   5500
3 David 14800   1480
4 Emma  15200   1216
5 Fay   13800   0
6 Gina  14500   5000
...

I am struggling with function. I was able to write a function to convert stringNum to int and do the operations. But I couldn't return to a dafaframe.

Here is my code:

def convertToNumber(df):
    for i in df.index:
        bonus = df.Bonus[i]
        if bonus == "0":
            bonus = 0
        elif bonus.endswith('k'):
            kb = bonus.split("k")
            #print(kb)
            bonus = int(float(kb[0]) * 1000)
        elif bonus.endswith("%"):
            total = int(df.Total[i])
            if len(bonus) > 2:
                b = int(bonus[: 2])
            else:
                b = int(bonus[0])
            bonus = total * b // 100
        else:
            bonus = -1
        print(bonus)

convertToNumber(df)
df

And my result is:

10000
1400
5500
1480
1216
0
5000

   Name     Total   Bonus
0   Amy     15000   10k
1   Bob     14000   10%
2   Cathy   13400   5.5k
3   David   14800   10% - 20%
4   Emma    15200   8%
5   Fay     13800   0 - 5%
6   Gina    14500   5k - 10k

CodePudding user response：

First, handle the ranges and select the lower value, then create two boolean masks for k and % separately and then apply all the related logic. For example:

# Handle ranges
df['Bonus'] = df['Bonus'].str.split('-').str[0].str.strip()

# Create boolean masks
ks = df['Bonus'].str.endswith('k')
ps = df['Bonus'].str.endswith('%')

# Remove 'k' and '%' and convert the type to float
df.loc[ks | ps, 'Bonus'] = df.loc[ks | ps, 'Bonus'].str[:-1]
df['Bonus'] = df['Bonus'].astype(float)

# Apply the mask logic and convert to int
df.loc[ks, 'Bonus'] = df.loc[ks, 'Bonus'] * 1000
df.loc[ps, 'Bonus'] = df.loc[ps, 'Total'] * df.loc[ps, 'Bonus'] / 100
df['Bonus'] = df['Bonus'].astype(int)

Result:

    Name  Total  Bonus
0    Amy  15000  10000
1    Bob  14000   1400
2  Cathy  13400   5500
3  David  14800   1480
4   Emma  15200   1216
5    Fay  13800      0
6   Gina  14500   5000