I try to add a new column "Result" to a dataframe "merged" which it contains the integers merged['Rate1'] and merged['Rate'] if the "Quantity Total" value between 0 and 500 then merged['Rate'] assign for merged['Rate1']. if the "Quantity Total" value is between 500 and 1000,then calculate merged['Rate1']*0.1 merged['Rate']*0.9 for "result" . if the "Quantity Total" value is between 1000 and 1500, then calculate merged['Rate1']*0.2 merged['Rate']*0.8 for "result" . if the "Quantity Total" value is between 1500 and 2000, then calculate merged['Rate1']*0.5 merged['Rate']*0.5 for "result". if the "Quantity Total" value is between 2000 and 2500, then calculate merged['Rate1']*0.25 merged['Rate']*0.75 for "result". if the "Quantity Total" value is between 2500 and 999999999, then it will be used just merged['Rate1']. I mean it will be the same rate for just last case.
I try to use np.where from numpy. Or trying to use dataframe.loc.
Any idea to help me please?
Thank you in advance
My input (1);
merged >>my dataframe is 2 rows. The most important column that I will use in the calculation is Quantity Total. I mean merged['Quantity Total Price']. this column has the number 1000.
Quantity Total Price |
---|
1000 |
My input (2); merged >> Same as the dataframe in item 1 above, but another column. I mean merged['Rate1']. this column has the number 14.5.
Rate1 |
---|
14.5 |
My input (3); merged >> merged['Rate']. this column has the number 15.
Rate |
---|
15 |
My expected output;
if merged['Quantity Total'] >= 0 & merged['Quantity Total'] <= 500 then, I want to use merged['Rate'] directly instead of merged['Rate1'] by creating a new column.
For example, Quantity Total First: 300 and according my if statement 300 is between 0 and 500 then, create new column and write "15" I mean merged['Rate']
merged["Quantity Total"] >= 500) & (merged["Quantity Total"] <= 1000) then,
calculate >> merged['Rate']*0.9 merged['Rate1']*0.1 and write result columns.
other cases are listed above.
CodePudding user response:
It is a bit difficult for me to understand exactly what you want to do (this could be due to my non-native english). I would suggest including a mock example of your input and a mock example of the output you want to get for making your question clearer.
From what I can understand however, using apply might be a good option for you, since you have a lot of custom conditions among different columns.
def test_function(row):
a= getattr(row,"A")
b= getattr(row,"B")
#test_output = "unforeseen"
if b == "b" and a != "a":
test_output = "wrong a"
if b != "b" and a == "a":
test_output = "wrong b"
if b != "b" and a != "a":
test_output = "wrong a & b"
if b == "b" and a == "a":
test_output = "correct a & b"
return(test_output)
test_df = pd.DataFrame(list(zip(["a","a","c","c"],
["b","c","b","c"])),
columns = ["A","B"])
test_df.apply(lambda x: test_function(x), axis =1)
CodePudding user response:
share_dict = {0: 0, 500: 0.1, 1000: 0.2, 1500: 0.5, 2000: 0.75, 2500: 1} # map for Rate1
def rates(key_list, total): # recursive function to return Rate1 and Rate shares
key_list = sorted(key_list)
if total >= key_list[-1]:
return share_dict[key_list[-1]], 1 - share_dict[key_list[-1]]
return rates(key_list[:-1], total)
# creating new calculated column with lambda
merged['calculated'] = merged.apply(lambda x: x['Rate1'] * rates(share_dict.keys(), x['Quantity Total'])[0]
x['Rate'] * rates(share_dict.keys(), x['Quantity Total'])[1], axis=1)
output:
Rate Rate1 Quantity Total calculated
2.5 1.5 3527 1.50
10.0 13.0 0 10.00
12.5 7.5 2190 8.75
11.5 0.5 967 10.40
4.5 0.5 516 4.10
9.0 5.5 20 9.00