Conditional statement to generate multi-column values in Python-CodePudding

I'm trying to replace values in columns 'Alloc1' and 'Alloc2' columns based on a condition in one column 'Number' in the below dataframe.

data = {'ID': ['001', '002', '003', '004'], 'Number': [99, 99, 20, 40], 'Alloc1': [np.NaN, np.NaN, np.NaN, np.NaN], 'Alloc2': [np.NaN, np.NaN, np.NaN, np.NaN]}
# Create DataFrame.
df = pd.DataFrame(data)

My code to insert values based on the condition is below:-

for  numbers  in df["Number"]:
    
    if  (numbers == 99):
        df["Alloc1"] = 31
        df["Alloc2"] = 3

    else:
        df["Alloc1"] = 0
        df["Alloc2"] = numbers/2

The above seems to execute only the else part of the statement and for the last value in column "Number" that is not 99. How can I fix this? A function will be great. The ideal output should be:-

final = {'ID': ['001', '002', '003', '004'], 'Number': [99, 99, 20, 40], 'Alloc1': [31, 31, 0, 0], 'Alloc2': [3, 3, 10, 20]}
# Create DataFrame.
final_df = pd.DataFrame(final)

CodePudding user response：

Assuming you can safely overwrite the entire columns Alloc1 and Alloc2, you can use np.where as Henry Ecker suggested:

df['Alloc1'] = np.where(df['Number'] == 99, 31, 0)
df['Alloc2'] = np.where(df['Number'] == 99, 3, df['Number'] / 2).astype(int)

print(df)
    ID  Number  Alloc1  Alloc2
0  001      99      31       3
1  002      99      31       3
2  003      20       0      10
3  004      40       0      20

CodePudding user response：

Try to use vectorized operation to treat this problem

import pandas as pd

data = {'ID': ['001', '002', '003', '004'], 'Number': [99, 99, 20, 40], 'Alloc1': [np.NaN, np.NaN, np.NaN, np.NaN], 'Alloc2': [np.NaN, np.NaN, np.NaN, np.NaN]}
# Create DataFrame.
df = pd.DataFrame(data)

df['Alloc1'] = 0
df['Alloc2'] = df['Number']/2
df.loc[df['Number'] == 99,'Alloc1'] = 31
df.loc[df['Number'] == 99,'Alloc2'] = 3
df

output
    ID  Number  Alloc1  Alloc2
0  001      99      31     3.0
1  002      99      31     3.0
2  003      20       0    10.0
3  004      40       0    20.0

CodePudding user response：

Think that the 'vectorized' solution will have better performance than this, and either that or the where version are more 'good pandas style'. This answer is just to show you how to achieve what you wanted using more like the approach you were following. This is not a very 'pandas' way to do things, but might be helpful in understanding why what you were trying didn't work.

import pandas as pd
import numpy as np

data = {'ID': ['001', '002', '003', '004'],
        'Number': [99, 99, 20, 40]}
        # Don't actually need the NaN-filled 'Alloc1' and 'Alloc2' yet
        # Those columns get created when you give them values, later
df = pd.DataFrame(data)

def allocateCodes(row):
    if (row['Number'] == 99):
        row['Alloc1'] = 31
        row['Alloc2'] = 3
    else:
        row['Alloc1'] = 0
        row['Alloc2'] = row['Number'] / 2
    return row

# axis=1 means go 'take each row' (i.e., a whole set of columns) at a time
# instead of 'take each column' (axis=0)         
outputDf = df.apply(allocateCodes, axis=1)

print(outputDf)

Outputs:

    ID  Number  Alloc1  Alloc2
0  001      99      31     3.0
1  002      99      31     3.0
2  003      20       0    10.0
3  004      40       0    20.0