Home > OS >  Passing multiple values to a function in Pandas to choose a value
Passing multiple values to a function in Pandas to choose a value

Time:09-26

I'm trying to get to grips with Pandas and I know that iterating over rows in pandas is something you should only do if you really have no other option (I think so anyways?)

My scenario is:

I have a list of data with 3 columns.

Each column has a different entry for example:

Color 1 Color 2 Color 3
Blue Red Green
Green Purple Black

I want to go over every row and call a function like

def choose_color(list_of_colors_in_row):
   if "Blue" in list_of_colors_in_row:
      return "Blue"
   else:
     return list_of_colors_in_row[0]

And add this to a new column in the dataframe.

So in the example table, it would add a new column with Blue, Green (The function is just an example)

I know I could do this by iterating over the rows, but I was wondering if there is a better way? I think you aren't supposed to edit a dataframe when you are iterating over it right?

I've looked up the apply function and thought that might do it - but I can't seem to figure out how to pass all 3 of the values in the arguments.

Apologies if this is a silly question - I really appreciate any help from anyone. Thank you!

CodePudding user response:

Try either one of these; Apply Lambda passes row in a loop...

def choose_color(list_of_colors_in_row):
   if "Blue" in list_of_colors_in_row:
      return "Blue"
   else:
     return list_of_colors_in_row[0]
 
df["New_Column"] = df.apply(lambda x: choose_color(x), axis=1)

Or;

df["New_Column1"] = df.apply(lambda x: "Blue" if "Blue" in x else x[0], axis=1)

CodePudding user response:

You can use np.where:

df['new_column'] = np.where((df == 'Blue').any(axis=1), 'Blue', df['Color 1'])

where np.where is basically an if-else, statement. If any of the columns contain Blue, it will return Blue, otherwise it will return df['Color 1']

CodePudding user response:

Thank you to 9769953 for the help

This is the working code I got:

import pandas as pd

data = {'color_1': ['Blue', 'Green'],
        'color_2': ['Red', 'Purple'],
        'color_3': ['Green', 'Black'],
        }
# Create the pandas DataFrame
df = pd.DataFrame(data)

def choose_color(row):
    list_of_colors_in_row = [row["color_1"], row["color_2"], row["color_3"]]
    if "Blue" in list_of_colors_in_row:
        return "Blue"
    else:
        return list_of_colors_in_row[0]

df["New Column"] = df.apply(choose_color, axis=1)


print(df.head())

  • Related