Home > Enterprise >  conditional operation on dataframe
conditional operation on dataframe

Time:04-11

my goal is to add a column to the dataframe based on a condition that takes values from other columns into account.

I have created a simple example that generates the same error:

numbers = {'A': [1,2,3,4,5], "B":[2,4,3,3,2]}
df = pd.DataFrame(numbers)

if df.A - df.B > 0:
    df["C"] = df.B*5
else: df["C"] = 0

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

I am sure the solution is simple but I am a beginner. Thanks for the support.

CodePudding user response:

c = []

for lab, row in df.iterrows():
    curr = 0
    if row['A'] > row['B']:
        curr = row['B'] * 5

    c.append(curr)

df['C'] = c 

CodePudding user response:

#Import pandas module
import pandas as pd

#Lists of data 
list_A = [1,2,3,4,5]
list_B = [2,4,3,3,2]

#Define a dictionary containing lists of data
dictionary = {'A':  list_A,
        'B': list_B}

#Convert the dictionary into DataFrame
data = pd.DataFrame(dictionary)
data

#New list
data_diff = data.A - data.B 

new_list=[]
for i in data_diff:
    if i > 0:
        new_list.append(i*5)
    else:
        new_list.append(0)

#New dataframe
new_dictionary = {'A':  [1,2,3,4,5],
        'B': [2,4,3,3,2],
        'C': new_list}

new_data=pd.DataFrame(new_dictionary)
new_data

A couple of notes: this is my very simple version. For sure, there are many other smarter and more "pythonic" versions. Finally, I think this website of tutorials can help you.

CodePudding user response:

You can use numpy's where:

df["C"] = np.where(df["A"] > df["B"], df["B"]*5, 0)

CodePudding user response:

You could do this:

df["C"] = df.A - df.B

# First turns negative values into 0s
df["C"].mask(df["C"] <= 0, 0, inplace=True)

# Then changes the value as needed if C > 0.
df["C"].mask(df["C"] > 0, df["B"]*5, inplace=True)
  • Related