I'm trying to get to grips with Pandas and I know that iterating over rows in pandas is something you should only do if you really have no other option (I think so anyways?)
My scenario is:
I have a list of data with 3 columns.
Each column has a different entry for example:
Color 1 | Color 2 | Color 3 |
---|---|---|
Blue | Red | Green |
Green | Purple | Black |
I want to go over every row and call a function like
def choose_color(list_of_colors_in_row):
if "Blue" in list_of_colors_in_row:
return "Blue"
else:
return list_of_colors_in_row[0]
And add this to a new column in the dataframe.
So in the example table, it would add a new column with Blue, Green (The function is just an example)
I know I could do this by iterating over the rows, but I was wondering if there is a better way? I think you aren't supposed to edit a dataframe when you are iterating over it right?
I've looked up the apply function and thought that might do it - but I can't seem to figure out how to pass all 3 of the values in the arguments.
Apologies if this is a silly question - I really appreciate any help from anyone. Thank you!
CodePudding user response:
Try either one of these; Apply Lambda passes row in a loop...
def choose_color(list_of_colors_in_row):
if "Blue" in list_of_colors_in_row:
return "Blue"
else:
return list_of_colors_in_row[0]
df["New_Column"] = df.apply(lambda x: choose_color(x), axis=1)
Or;
df["New_Column1"] = df.apply(lambda x: "Blue" if "Blue" in x else x[0], axis=1)
CodePudding user response:
You can use np.where
:
df['new_column'] = np.where((df == 'Blue').any(axis=1), 'Blue', df['Color 1'])
where np.where
is basically an if-else, statement. If any of the columns contain Blue, it will return Blue, otherwise it will return df['Color 1']
CodePudding user response:
Thank you to 9769953 for the help
This is the working code I got:
import pandas as pd
data = {'color_1': ['Blue', 'Green'],
'color_2': ['Red', 'Purple'],
'color_3': ['Green', 'Black'],
}
# Create the pandas DataFrame
df = pd.DataFrame(data)
def choose_color(row):
list_of_colors_in_row = [row["color_1"], row["color_2"], row["color_3"]]
if "Blue" in list_of_colors_in_row:
return "Blue"
else:
return list_of_colors_in_row[0]
df["New Column"] = df.apply(choose_color, axis=1)
print(df.head())