I'd like to know a way to substract the values vouchers from credit balance in a dataframe.
There is a column "credit" that will try to match the vouchers used : "v1", "v2", ecc.
So the vouchers should be covered starting from the latest to the most recent. From voucher 3 to voucher 1.
The credit column should try to cover the vouchers (from 3 to 1). If credit surpass the vouchers the remaining credit should be stored in the credit column.
I am using python notebooks with pandas and PySpark libraries.
CodePudding user response:
One of the many ways to acheive this would be using pandas.apply.
See if this helps:
import numpy as np
import pandas as pd
data={
"name":["tom","jim"],
"cummulative":[17,15],
"voucher1":[10,0],
"voucher2":[5,5],
"voucher3":[2,10],
"credit":[20,10]
}
df=pd.DataFrame(data)
def change_order(row):
new_dict=row.to_dict()
credit=row.credit
cummulative=row.cummulative
for i in range(3,0,-1):
current=row[f"voucher{i}"]
if credit>=current:
credit-=current
cummulative-=current
new_dict["credit"]=credit
new_dict["cummulative"]=cummulative
new_dict[f"voucher{i}"]=0
series=pd.Series(new_dict)
return series
df=df.apply(change_order,axis=1)
print(df)
Output:
name cummulative voucher1 voucher2 voucher3 credit
0 tom 0 0 0 0 3
1 jim 5 0 5 0 0
CodePudding user response:
It's actually a very nice exercise.
This is what could be created in PySpark. It feels though, there should be a nicer way...
Input:
from pyspark.sql import functions as F
df = spark.createDataFrame(
[('tom', 17, 10, 5, 2, 20),
('jim', 15, 0, 5, 10, 10)],
['name', 'cumulative_vouchers_used', 'voucher1', 'voucher2', 'voucher3', 'credit'])
Script:
c, v1, v2, v3 = F.col('credit'), F.col('voucher1'), F.col('voucher2'), F.col('voucher3')
subt_v3 = F.when(c >= v3, c - v3).otherwise(c)
new_v3 = F.when(c >= v3, 0).otherwise(v3)
subt_v2 = F.when((subt_v3 >= v2) & (subt_v3 != c), subt_v3 - v2).otherwise(subt_v3)
new_v2 = F.when(subt_v3 >= v2, 0).otherwise(v2)
subt_v1 = F.when((subt_v2 >= v1) & (subt_v2 != subt_v3), subt_v2 - v1).otherwise(subt_v2)
new_v1 = F.when(subt_v2 >= v1, 0).otherwise(v1)
new_cum = new_v1 new_v2 new_v3
df = df.select(
'name',
new_cum.alias('cumulative_vouchers_used'),
new_v1.alias('voucher1'),
new_v2.alias('voucher2'),
new_v3.alias('voucher3'),
subt_v1.alias('credit')
)
df.show()
# ---- ------------------------ -------- -------- -------- ------
# |name|cumulative_vouchers_used|voucher1|voucher2|voucher3|credit|
# ---- ------------------------ -------- -------- -------- ------
# | tom| 0| 0| 0| 0| 3|
# | jim| 5| 0| 5| 0| 0|
# ---- ------------------------ -------- -------- -------- ------