I am stuck on how to write a function to calculate the value of one column based on another column.
For example, my dataframe looks like this:
Name Total Bonus
0 Amy 15000 10k
1 Bob 14000 10%
2 Cathy 13400 5.5k
3 David 14800 10% - 20%
4 Emma 15200 8%
5 Fay 13800 0 - 5%
6 Gina 14500 5k - 10k
...
Code for recreating it:
df = pd.DataFrame()
df["Name"] = ["Amy", "Bob", "Cathy", "David", "Emma", "Fay", "Gina"]
df["Total"] = ["15000", "14000", "13400", "14800","15200","13800","14500"]
df["Bonus"] = ["10k", "10%", "5.5k", "10% - 20%", "8%", "0 - 5%", "5k - 10k"]
The dtype
of all columns in the dataframe are Object
.
What I want is to convert the Bonus column in an organized way:
- If it endswith "k", it converts to int(number * 1000)
- If it endswith "%", it converts to int(total * number) // 100
- If it is a range (like 10% - 20%), go with the lowest(which is 10%), and convert to int(total * number) // 100
The result df should look like this:
Name Total Bonus
0 Amy 15000 10000
1 Bob 14000 1400
2 Cathy 13400 5500
3 David 14800 1480
4 Emma 15200 1216
5 Fay 13800 0
6 Gina 14500 5000
...
I am struggling with function. I was able to write a function to convert stringNum to int and do the operations. But I couldn't return to a dafaframe.
Here is my code:
def convertToNumber(df):
for i in df.index:
bonus = df.Bonus[i]
if bonus == "0":
bonus = 0
elif bonus.endswith('k'):
kb = bonus.split("k")
#print(kb)
bonus = int(float(kb[0]) * 1000)
elif bonus.endswith("%"):
total = int(df.Total[i])
if len(bonus) > 2:
b = int(bonus[: 2])
else:
b = int(bonus[0])
bonus = total * b // 100
else:
bonus = -1
print(bonus)
convertToNumber(df)
df
And my result is:
10000
1400
5500
1480
1216
0
5000
Name Total Bonus
0 Amy 15000 10k
1 Bob 14000 10%
2 Cathy 13400 5.5k
3 David 14800 10% - 20%
4 Emma 15200 8%
5 Fay 13800 0 - 5%
6 Gina 14500 5k - 10k
CodePudding user response:
First, handle the ranges and select the lower value, then create two boolean masks for k
and %
separately and then apply all the related logic. For example:
# Handle ranges
df['Bonus'] = df['Bonus'].str.split('-').str[0].str.strip()
# Create boolean masks
ks = df['Bonus'].str.endswith('k')
ps = df['Bonus'].str.endswith('%')
# Remove 'k' and '%' and convert the type to float
df.loc[ks | ps, 'Bonus'] = df.loc[ks | ps, 'Bonus'].str[:-1]
df['Bonus'] = df['Bonus'].astype(float)
# Apply the mask logic and convert to int
df.loc[ks, 'Bonus'] = df.loc[ks, 'Bonus'] * 1000
df.loc[ps, 'Bonus'] = df.loc[ps, 'Total'] * df.loc[ps, 'Bonus'] / 100
df['Bonus'] = df['Bonus'].astype(int)
Result:
Name Total Bonus
0 Amy 15000 10000
1 Bob 14000 1400
2 Cathy 13400 5500
3 David 14800 1480
4 Emma 15200 1216
5 Fay 13800 0
6 Gina 14500 5000