I am using df.iterrows() for the following task.
Col3 in the below table is based on the values in col1 & col2.
------ ------ -------
| col1 | col2 | col3 |
------ ------ -------
| 1 | A | White |
| 1 | B | Green |
| 2 | A | Blue |
| 2 | B | Red |
------ ------ -------
This is the code I am using, it takes a while as the DataFrame has many rows.
for i, r in df.iterrows():
if r.col1 == "1" and r.col2 == "A":
value = "White"
elif r.col1 == "1" and r.col2 != "B":
value = "Green"
elif r.col1 == "2" and r.col2 == "A":
value = "Blue"
elif r.col1 == "2" and r.col2 != "B":
value = "Red"
else:
value = None
df.loc[i, 'col3'] = value
I am convinced there must be a better way but my searches haven't been successful so far.
CodePudding user response:
Use numpy.select
:
import numpy as np
conditions = [df["col1"].eq(1)&df["col2"].eq("A"),
df["col1"].eq(1)&df["col2"].eq("B"),
df["col1"].eq(2)&df["col2"].eq("A"),
df["col1"].eq(2)&df["col2"].eq("B")]
choices = ["White", "Green", "Blue", "Red"]
df["col3"] = np.select(conditions, choices, None)
CodePudding user response:
A probably less performing solution would be to create an aggregation column and use map to translate it:
df['col3'] = (df.col1.astype(str) df.col2).map({'1A':"White", '1B':"Green", '2A':"Blue", '2B':"Red"})