Is here a way to loop through dataframe in PANDAS and multiply by 2 different conditions?-CodePudding

Essentially I am working with a dataframe and I am trying to multiply by 2 different conditions. If the value in order description == Internet Port Charge It needs to be multiplied in the amount coloumn by .33 and if not then by 1.9. I keep getting a value error. Thank you!

for x in max_sales:
if max_sales['Order description'] == 'Internet Port Charge':
    max_sales['amount'] * .33
else:
    max_sales['amount'] * 111.9

 1 for x in max_sales:
----> 2     if max_sales['Order description'] == 'Internet Port Charge':
  3         max_sales['amount'] * .33
  4     else:
  5         max_sales['amount'] * 111.9

~\Anaconda3\lib\site-packages\pandas\core\generic.py in __nonzero__(self)
   1535     @final
   1536     def __nonzero__(self):
-> 1537         raise ValueError(
   1538             f"The truth value of a {type(self).__name__} is ambiguous. "
   1539             "Use a.empty, a.bool(), a.item(), a.any() or a.all()."

CodePudding user response：

If you just have these conditions, then .loc the parts that you want to multiply by an amount, then assign them that amount:

max_sales.loc[max_sales['Order description'] == 'Internet Port Charge']['amount'] = max_sales.loc[max_sales['Order description'] == 'Internet Port Charge']['amount']*0.33

max_sales.loc[~(max_sales['Order description'] == 'Internet Port Charge')]['amount'] = max_sales.loc[~(max_sales['Order description'] == 'Internet Port Charge')]['amount']*1.9

I don't see what the for x in max_sales is supposed to do, seeing as x isn't used again later.

CodePudding user response：

You could use NumPy's .where():

import numpy as np

max_sales['amount'] = np.where(
    max_sales['Order description'] == 'Internet Port Charge',
    max_sales['amount'] * .33,
    max_sales['amount'] * 111.9
)

This looks for rows where the condition is met and multiplies those values by 0.33. Where the condition is False, it multiplies by 111.9. It's also significantly faster (and cleaner) than iterating over the DataFrame.

CodePudding user response：

for index, row in df.iterrows():
if row['Order description'] == 'Internet Port Charge':
    row['amount'] = row['amount'] * 0.33
else:
    row['amount'] = row['amount'] * 111.9

You must loop through the DataFrame using .iterrows() then you can access each row individually

.iterrows() is very resource intensive though.

CodePudding user response：

You can use apply and lambda:

import pandas as pd


# Set up dummy data
df = [
    ["Internet Port Change", 20],
    ["Foobar", 20]
]
df = pd.DataFrame(df, columns=["Order description", "amount"])
#       Order description  amount
# 0  Internet Port Change      20
# 1                Foobar      20


# Use apply and lambda
df["amount"] = df.apply(
    lambda x: x["amount"]*0.33 if x["Order description"] == "Internet Port Change" \
        else x["amount"]*111.9,
    axis=1)
#       Order description  amount
# 0  Internet Port Change     6.6
# 1                Foobar  2238.0