I have multiple dataframes of the same shape. Each consists of 14 columns: user id, 12 months, and name of the user's plan. Few hundreds of rows. Values in month-based columns of each separate df is the total calls duration, number of messages, amount of traffic used, etc per month per user.
Something like this:
d = {
'id': [100, 101, 102],
'call_dur_01': [0, 55, 0],
'call_dur_02': [30, 40, 0],
'call_dur_03': [45, 50, 20],
'plan': ['eco', 'prem', 'eco'
}
df = pd.DataFrame(data=d)
Problem is: Calculating revenue per user by performing calculations on monthly values based on user's plan, while accounting for amount of service included into monthly fee.
I tried to streamline the process by writing a function that will take amounts included into monthly fee (incl) and cost of extra services per unit, and run through chosen dataframe returning costs of extra services if user has surpassed incl value.
def revenue_calc(prem_incl, eco_incl, prem_extra, eco_extra, df):
for i in range(4):
for j in range (1, 4):
if df.iloc[i, 4] == 'prem' and df.iloc[i, j] > prem_incl:
return (df.iloc[i, j] - prem_incl) * prem_extra
if df.iloc[i, 4] == 'prem' and df.iloc[i, j] <= prem_incl:
return 0
if df.iloc[i, 4] == 'eco' and df.iloc[i, j] > eco_incl:
return (df.iloc[i, j] - eco_incl) * eco_extra
if df.iloc[i, 4] == 'eco' and df.iloc[i, j] <= eco_incl:
return 0
Then I apply it.
call_monthly_revenue = call_dur.apply(revenue_calc(50, 30, 1, 3, call_dur), axis=1)
As an output I expect dataframe of the same shape, with month-based columns values being amount of extra that users has paid in each given month (amount past prem_incl, eco_incl, multiplied by prem_extra, eco_extra), zeros if they didn't exceed the limit (prem_incl, eco_incl).
Not working. The error also confuses me because there's no explanation at the end, just "AssertionError: "
It's not a job related task, it's a study related task, and just a data prep stage, not even current topic. So I'd very much like to understand what's wrong with this function. Thank you!
CodePudding user response:
Map
the values in plan
to calculate included_units
and extra_cost_per_unit
, then filter
the call_dur
like columns and subtract included_units
to calculate extra units, finally multiply the extra units by extra_cost_per_unit
column to get charges incurred
included_units = df['plan'].map({'prem': 50, 'eco': 30})
extra_cost_per_unit = df['plan'].map({'prem': 1, 'eco': 3})
cost_to_customer = (
df.filter(like='call_dur')
.sub(included_units, axis=0)
.clip(lower=0)
.mul(extra_cost_per_unit, axis=0)
)
Result
>>> cost_to_customer
call_dur_01 call_dur_02 call_dur_03
0 0 0 45
1 5 0 0
2 0 0 0
# Assign cost_to_customers back to original dataframe
>>> df.assign(**cost_to_customer)
id call_dur_01 call_dur_02 call_dur_03 plan
0 100 0 0 45 eco
1 101 5 0 0 prem
2 102 0 0 0 eco
CodePudding user response:
There is multiple issues with this. The AssertionError is because you are passing a single value to apply
(the return value of revenue_calc), while it actually expects a function it can call.
Apply is not not really necessary here, since we can do all calculations through operations between columns. Here is one way of doing what I think you want to do:
prem_incl, eco_incl, prem_extra, eco_extra = 50, 30, 1, 3
# put the included minutes and tarif in extra columns
df["incl"] = df.plan.replace({"eco": eco_incl, "prem": prem_incl})
df["extra"] = df.plan.replace({"eco": eco_extra, "prem": prem_extra})
month_cols = [c for c in df.columns if c.startswith("call_dur")]
df["revenue_per_user"] = (df[month_cols]
.sub(df.incl.values, axis=0) # subtract included minutes
.clip(lower=0) # can't go negative
.mul(df.extra.values, axis=0) # multiply all months with tarif
.sum(axis=1)) # sum all months
print(df.drop(["incl", "extra"], axis=1))
## Out:
id call_dur_01 call_dur_02 call_dur_03 plan revenue_per_user
0 100 0 30 45 eco 45
1 101 55 40 50 prem 5
2 102 0 0 20 eco 0