How to Create a New Column Based off of two existing columns-CodePudding

I have a table similar to this (below). I want to create a new column called "Benchmark_Level" that shows where the individual that gone above or below the metric

Example:

Benchmark for jumps is 3; benchmark for skips is 4

I have tried do a nested list comprehension [[ 'Below_Benchmark' if (i < 3) else 'At_Benchmark' if (i == 3) else 'Above_Benchmark' if (i > 3) else "Unknown" for i == 'Jumps ]for i in df['metric']

But I get a syntax error. I also don't know how to tie in the "Value" column. my psuedo code is, If metric is equal to "Jumps" and the value is less than, or equal to, or greater than the benchmark value then create a new column and assign that value to "below_benchmark", "at_benchmark" or "above_benchmark".

How do I do this?

CodePudding user response：

Use a dictionary to map the reference and get the sign of the difference to the value:

import numpy as np

d = {'Jumps': 3, 'Skips': 4}

df['benchmark_level'] = (
 np.sign(df['value'].sub(df['metric'].map(d)))
   .map({1: 'above_benchmark', 0: 'at_benchmark', -1: 'below_benchmark'})
)

CodePudding user response：

As someone commented, it is unclear what you want to happen when the metric is "Skips", but i'll use an empty string.

df['benchmark_level'] = [
    '' if metric != "Jumps"
    else 'Below_Benchmark' if value < 3
    else 'Above_Benchmark' if value > 3 
    else "At_benchmark" for value, metric in zip(df['value'], df['metric'])
]

The syntax error you're getting is because you have an unended if statement and that isn't syntactically correct in one-liners. All one liners need to end with an else that handles all other conditions not addressed by any if, even if everything is covered logically. You can use the built in zip function to loop through both lists simultaneously.

CodePudding user response：

df['benchmark_level'] = np.where((df['metric'].eq('Jumps') & df['value'].gt(3)),'Above_Benchmark',\
                                 np.where((df['metric'].eq('Jumps') & df['value'].eq(3)),'At_Benchmark',\
                                           np.where((df['metric'].eq('Jumps') & df['value'].lt(3)),'Below_Benchmark',\
                                                    np.where((df['metric'].eq('Skips') & df['value'].gt(4)),'Above_Benchmark',\
                                                             np.where((df['metric'].eq('Skips') & df['value'].eq(4)),'At_Benchmark',\
                                                                      np.where((df['metric'].eq('Skips') & df['value'].lt(4)),'Below_Benchmark','No_data'))))))