Performing calculations on values in a row based on a defining value in the same row-CodePudding

I have multiple dataframes of the same shape. Each consists of 14 columns: user id, 12 months, and name of the user's plan. Few hundreds of rows. Values in month-based columns of each separate df is the total calls duration, number of messages, amount of traffic used, etc per month per user.

Something like this:

d = {
    'id': [100, 101, 102],
    'call_dur_01': [0, 55, 0],
    'call_dur_02': [30, 40, 0],
    'call_dur_03': [45, 50, 20],
    'plan': ['eco', 'prem', 'eco'
} 

df = pd.DataFrame(data=d)

Problem is: Calculating revenue per user by performing calculations on monthly values based on user's plan, while accounting for amount of service included into monthly fee.

I tried to streamline the process by writing a function that will take amounts included into monthly fee (incl) and cost of extra services per unit, and run through chosen dataframe returning costs of extra services if user has surpassed incl value.

def revenue_calc(prem_incl, eco_incl, prem_extra, eco_extra, df):
     for i in range(4):
         for j in range (1, 4):
             if df.iloc[i, 4] == 'prem' and df.iloc[i, j] > prem_incl:
                 return (df.iloc[i, j] - prem_incl) * prem_extra
             if df.iloc[i, 4] == 'prem' and df.iloc[i, j] <= prem_incl:
                 return 0
             if df.iloc[i, 4] == 'eco' and df.iloc[i, j] > eco_incl:
                 return (df.iloc[i, j] - eco_incl) * eco_extra
             if df.iloc[i, 4] == 'eco' and df.iloc[i, j] <= eco_incl:
                 return 0

Then I apply it.

call_monthly_revenue = call_dur.apply(revenue_calc(50, 30, 1, 3, call_dur), axis=1)

As an output I expect dataframe of the same shape, with month-based columns values being amount of extra that users has paid in each given month (amount past prem_incl, eco_incl, multiplied by prem_extra, eco_extra), zeros if they didn't exceed the limit (prem_incl, eco_incl).

Not working. The error also confuses me because there's no explanation at the end, just "AssertionError: "

It's not a job related task, it's a study related task, and just a data prep stage, not even current topic. So I'd very much like to understand what's wrong with this function. Thank you!

CodePudding user response：

Map the values in plan to calculate included_units and extra_cost_per_unit, then filter the call_dur like columns and subtract included_units to calculate extra units, finally multiply the extra units by extra_cost_per_unit column to get charges incurred

included_units = df['plan'].map({'prem': 50, 'eco': 30})
extra_cost_per_unit = df['plan'].map({'prem': 1, 'eco': 3})

cost_to_customer = (
    df.filter(like='call_dur')
      .sub(included_units, axis=0)
      .clip(lower=0)
      .mul(extra_cost_per_unit, axis=0)
)

Result

>>> cost_to_customer
   call_dur_01  call_dur_02  call_dur_03
0            0            0           45
1            5            0            0
2            0            0            0

# Assign cost_to_customers back to original dataframe
>>> df.assign(**cost_to_customer)

    id  call_dur_01  call_dur_02  call_dur_03  plan
0  100            0            0           45   eco
1  101            5            0            0  prem
2  102            0            0            0   eco

CodePudding user response：

There is multiple issues with this. The AssertionError is because you are passing a single value to apply (the return value of revenue_calc), while it actually expects a function it can call.

Apply is not not really necessary here, since we can do all calculations through operations between columns. Here is one way of doing what I think you want to do:

prem_incl, eco_incl, prem_extra, eco_extra = 50, 30, 1, 3

# put the included minutes and tarif in extra columns
df["incl"] = df.plan.replace({"eco": eco_incl, "prem": prem_incl})
df["extra"] = df.plan.replace({"eco": eco_extra, "prem": prem_extra})

month_cols = [c for c in df.columns if c.startswith("call_dur")]
df["revenue_per_user"] = (df[month_cols]
                              .sub(df.incl.values, axis=0)  # subtract included minutes
                              .clip(lower=0)                # can't go negative
                              .mul(df.extra.values, axis=0) # multiply all months with tarif
                              .sum(axis=1))                 # sum all months

print(df.drop(["incl", "extra"], axis=1))


## Out:
    id  call_dur_01  call_dur_02  call_dur_03  plan  revenue_per_user
0  100            0           30           45   eco                45
1  101           55           40           50  prem                 5
2  102            0            0           20   eco                 0