Home > database >  Run same function on multiples columns and return a new column
Run same function on multiples columns and return a new column

Time:12-30

I have several columns in a dataframe that have the values Green/ Yellow/ Red:

Sample:

Date Index1 Index2
20-Dec-21 Green Yellow
21-Dec-21 Red Yellow

I want to add one more column to this dataframe that first assigns a score to each column based on the logic: Score = 1 if Green, 0.5 if Yellow, 0 if Red and then adds these individual scores to produce a final score. Eg. for Row 1, score = 1 0.5 = 1.5, for row 2 score = 0 0.5 =0.5 and so on.

The func itself is easy to write:

def color_to_score(x):
    if (x=='Green'):
        return 1
    elif (x=='Yellow'):
        return 0.5
    else: return 0

But I am struggling to apply this to each column and then adding the resulting score across columns to produce a new one in an elegant way. I can obviously do something like:

df['Index1score'] = df['Index1'].apply(color_to_score)

to produce a score column for each of the relevant columns and then add them but that is very inelegant and not scalable. Looking for help.

CodePudding user response:

Here is an alternative using replace():

replace_dict = {'Green':1,'Yellow':.5,'\w':0}
df.assign(new_col = df[['col1','col2']].replace(replace_dict,regex=True).sum(axis=1))

Also, instead of using \w to replace all other words with 0, you could use pd.to_numeric() and set errors = 'coerce' to convert all non numeric values to NaN

replace_dict = {'Green':1,'Yellow':.5}
df.assign(new_col = pd.to_numeric(df[['col1','col2']].replace(replace_dict).stack(),errors='coerce').unstack().sum(axis=1))

Output:

        Date   col1    col2  new_col
0  20-Dec-21  Green  Yellow      1.5
1  21-Dec-21    Red  Yellow      0.5

CodePudding user response:

  1. You need to supply Axis=1 to apply the function to each row. 2.x in your function would be a row (not a cell).
  2. You can convert x, the row, to a list.
  3. Count how many times each value is in the list, and multiply it by its value.
  4. Sum, and output the result.

CodePudding user response:

Make your life easy by choosing Python over pandas.

score_dict = {'Green': 1, 'Yellow': 0.5, 'Red': 0}

df = pd.DataFrame(data)
df["total_score"] = 0.0

for index, row in df.iterrows():
   df.at[index, "total_score"] = score_dict[row["Index1"]]   score_dict[row["Index2"]]

print(df)

CodePudding user response:

Came up with this solution.

scores = []
for index in range(len(df.index)):
    scoreTotal = 0
    for column in df.columns:
        color = df[column][index]
        scoreTotal  = color_to_score(color)

    scores.append(scoreTotal)

df["Score"] = scores
  • Related