Run same function on multiples columns and return a new column-CodePudding

I have several columns in a dataframe that have the values Green/ Yellow/ Red:

Sample:

Date	Index1	Index2
20-Dec-21	Green	Yellow
21-Dec-21	Red	Yellow

I want to add one more column to this dataframe that first assigns a score to each column based on the logic: Score = 1 if Green, 0.5 if Yellow, 0 if Red and then adds these individual scores to produce a final score. Eg. for Row 1, score = 1 0.5 = 1.5, for row 2 score = 0 0.5 =0.5 and so on.

The func itself is easy to write:

def color_to_score(x):
    if (x=='Green'):
        return 1
    elif (x=='Yellow'):
        return 0.5
    else: return 0

But I am struggling to apply this to each column and then adding the resulting score across columns to produce a new one in an elegant way. I can obviously do something like:

df['Index1score'] = df['Index1'].apply(color_to_score)

to produce a score column for each of the relevant columns and then add them but that is very inelegant and not scalable. Looking for help.

CodePudding user response：

Here is an alternative using replace():

replace_dict = {'Green':1,'Yellow':.5,'\w':0}
df.assign(new_col = df[['col1','col2']].replace(replace_dict,regex=True).sum(axis=1))

Also, instead of using \w to replace all other words with 0, you could use pd.to_numeric() and set errors = 'coerce' to convert all non numeric values to NaN

replace_dict = {'Green':1,'Yellow':.5}
df.assign(new_col = pd.to_numeric(df[['col1','col2']].replace(replace_dict).stack(),errors='coerce').unstack().sum(axis=1))

Output:

        Date   col1    col2  new_col
0  20-Dec-21  Green  Yellow      1.5
1  21-Dec-21    Red  Yellow      0.5

CodePudding user response：

You need to supply Axis=1 to apply the function to each row. 2.x in your function would be a row (not a cell).
You can convert x, the row, to a list.
Count how many times each value is in the list, and multiply it by its value.
Sum, and output the result.

CodePudding user response：

Make your life easy by choosing Python over pandas.

score_dict = {'Green': 1, 'Yellow': 0.5, 'Red': 0}

df = pd.DataFrame(data)
df["total_score"] = 0.0

for index, row in df.iterrows():
   df.at[index, "total_score"] = score_dict[row["Index1"]]   score_dict[row["Index2"]]

print(df)

CodePudding user response：

Came up with this solution.

scores = []
for index in range(len(df.index)):
    scoreTotal = 0
    for column in df.columns:
        color = df[column][index]
        scoreTotal  = color_to_score(color)

    scores.append(scoreTotal)

df["Score"] = scores