I'm trying to replace values in columns 'Alloc1' and 'Alloc2' columns based on a condition in one column 'Number' in the below dataframe.
data = {'ID': ['001', '002', '003', '004'], 'Number': [99, 99, 20, 40], 'Alloc1': [np.NaN, np.NaN, np.NaN, np.NaN], 'Alloc2': [np.NaN, np.NaN, np.NaN, np.NaN]}
# Create DataFrame.
df = pd.DataFrame(data)
My code to insert values based on the condition is below:-
for numbers in df["Number"]:
if (numbers == 99):
df["Alloc1"] = 31
df["Alloc2"] = 3
else:
df["Alloc1"] = 0
df["Alloc2"] = numbers/2
The above seems to execute only the else part of the statement and for the last value in column "Number" that is not 99. How can I fix this? A function will be great. The ideal output should be:-
final = {'ID': ['001', '002', '003', '004'], 'Number': [99, 99, 20, 40], 'Alloc1': [31, 31, 0, 0], 'Alloc2': [3, 3, 10, 20]}
# Create DataFrame.
final_df = pd.DataFrame(final)
CodePudding user response:
Assuming you can safely overwrite the entire columns Alloc1
and Alloc2
, you can use np.where
as Henry Ecker suggested:
df['Alloc1'] = np.where(df['Number'] == 99, 31, 0)
df['Alloc2'] = np.where(df['Number'] == 99, 3, df['Number'] / 2).astype(int)
print(df)
ID Number Alloc1 Alloc2
0 001 99 31 3
1 002 99 31 3
2 003 20 0 10
3 004 40 0 20
CodePudding user response:
Try to use vectorized operation to treat this problem
import pandas as pd
data = {'ID': ['001', '002', '003', '004'], 'Number': [99, 99, 20, 40], 'Alloc1': [np.NaN, np.NaN, np.NaN, np.NaN], 'Alloc2': [np.NaN, np.NaN, np.NaN, np.NaN]}
# Create DataFrame.
df = pd.DataFrame(data)
df['Alloc1'] = 0
df['Alloc2'] = df['Number']/2
df.loc[df['Number'] == 99,'Alloc1'] = 31
df.loc[df['Number'] == 99,'Alloc2'] = 3
df
output
ID Number Alloc1 Alloc2
0 001 99 31 3.0
1 002 99 31 3.0
2 003 20 0 10.0
3 004 40 0 20.0
CodePudding user response:
Think that the 'vectorized' solution will have better performance than this, and either that or the where
version are more 'good pandas style'. This answer is just to show you how to achieve what you wanted using more like the approach you were following. This is not a very 'pandas' way to do things, but might be helpful in understanding why what you were trying didn't work.
import pandas as pd
import numpy as np
data = {'ID': ['001', '002', '003', '004'],
'Number': [99, 99, 20, 40]}
# Don't actually need the NaN-filled 'Alloc1' and 'Alloc2' yet
# Those columns get created when you give them values, later
df = pd.DataFrame(data)
def allocateCodes(row):
if (row['Number'] == 99):
row['Alloc1'] = 31
row['Alloc2'] = 3
else:
row['Alloc1'] = 0
row['Alloc2'] = row['Number'] / 2
return row
# axis=1 means go 'take each row' (i.e., a whole set of columns) at a time
# instead of 'take each column' (axis=0)
outputDf = df.apply(allocateCodes, axis=1)
print(outputDf)
Outputs:
ID Number Alloc1 Alloc2
0 001 99 31 3.0
1 002 99 31 3.0
2 003 20 0 10.0
3 004 40 0 20.0