Home > Software engineering >  Panda Dataframe - Using .iterrows() is there a better way?
Panda Dataframe - Using .iterrows() is there a better way?

Time:06-05

I am using df.iterrows() for the following task.

Col3 in the below table is based on the values in col1 & col2.

 ------ ------ ------- 
| col1 | col2 | col3  |
 ------ ------ ------- 
|    1 | A    | White |
|    1 | B    | Green |
|    2 | A    | Blue  |
|    2 | B    | Red   |
 ------ ------ ------- 

This is the code I am using, it takes a while as the DataFrame has many rows.

for i, r in df.iterrows():
    if r.col1 == "1" and r.col2 == "A":
        value = "White"
    elif r.col1 == "1" and r.col2 != "B":
        value = "Green"
    elif r.col1 == "2" and r.col2 == "A":
        value = "Blue"
    elif r.col1 == "2" and r.col2 != "B":
        value = "Red"
    else:
        value = None

    df.loc[i, 'col3'] = value
    

I am convinced there must be a better way but my searches haven't been successful so far.

CodePudding user response:

Use numpy.select:

import numpy as np

conditions = [df["col1"].eq(1)&df["col2"].eq("A"),
              df["col1"].eq(1)&df["col2"].eq("B"),
              df["col1"].eq(2)&df["col2"].eq("A"),
              df["col1"].eq(2)&df["col2"].eq("B")]

choices = ["White", "Green", "Blue", "Red"]

df["col3"] = np.select(conditions, choices, None)

CodePudding user response:

A probably less performing solution would be to create an aggregation column and use map to translate it:

df['col3'] = (df.col1.astype(str) df.col2).map({'1A':"White", '1B':"Green", '2A':"Blue", '2B':"Red"})
  • Related