I have several columns in a dataframe that have the values Green/ Yellow/ Red:
Sample:
Date | Index1 | Index2 |
---|---|---|
20-Dec-21 | Green | Yellow |
21-Dec-21 | Red | Yellow |
I want to add one more column to this dataframe that first assigns a score to each column based on the logic: Score = 1 if Green, 0.5 if Yellow, 0 if Red and then adds these individual scores to produce a final score. Eg. for Row 1, score = 1 0.5 = 1.5, for row 2 score = 0 0.5 =0.5 and so on.
The func itself is easy to write:
def color_to_score(x):
if (x=='Green'):
return 1
elif (x=='Yellow'):
return 0.5
else: return 0
But I am struggling to apply this to each column and then adding the resulting score across columns to produce a new one in an elegant way. I can obviously do something like:
df['Index1score'] = df['Index1'].apply(color_to_score)
to produce a score column for each of the relevant columns and then add them but that is very inelegant and not scalable. Looking for help.
CodePudding user response:
Here is an alternative using replace()
:
replace_dict = {'Green':1,'Yellow':.5,'\w':0}
df.assign(new_col = df[['col1','col2']].replace(replace_dict,regex=True).sum(axis=1))
Also, instead of using \w
to replace all other words with 0, you could use pd.to_numeric()
and set errors = 'coerce'
to convert all non numeric values to NaN
replace_dict = {'Green':1,'Yellow':.5}
df.assign(new_col = pd.to_numeric(df[['col1','col2']].replace(replace_dict).stack(),errors='coerce').unstack().sum(axis=1))
Output:
Date col1 col2 new_col
0 20-Dec-21 Green Yellow 1.5
1 21-Dec-21 Red Yellow 0.5
CodePudding user response:
- You need to supply Axis=1 to apply the function to each row.
2.
x
in your function would be a row (not a cell). - You can convert
x
, the row, to a list. - Count how many times each value is in the list, and multiply it by its value.
- Sum, and output the result.
CodePudding user response:
Make your life easy by choosing Python over pandas.
score_dict = {'Green': 1, 'Yellow': 0.5, 'Red': 0}
df = pd.DataFrame(data)
df["total_score"] = 0.0
for index, row in df.iterrows():
df.at[index, "total_score"] = score_dict[row["Index1"]] score_dict[row["Index2"]]
print(df)
CodePudding user response:
Came up with this solution.
scores = []
for index in range(len(df.index)):
scoreTotal = 0
for column in df.columns:
color = df[column][index]
scoreTotal = color_to_score(color)
scores.append(scoreTotal)
df["Score"] = scores