I have the following working code that creates some simple maths functions and a DataFrame with multi-indexed columns. I would like to conditionally apply some functions over it.
import pandas as pd
import numpy as np
# Simple maths functions ---
def add(scalars): return sum(scalars) #Adds every element in scalars
def sub(scalars): return (scalars[0] - sum(scalars[1:])) #First scalar value subtracted by the rest of the scalar list
def mul(scalars): return np.prod(scalars) #Multiplies every element in scalars
def divi(scalars): return (scalars[0] / np.prod(scalars[1:])) #First scalar value divided by the rest product of the scalar list
# Create df ---
operatorList = [add, sub, mul, divi] # List of our maths functions
names = ['add', 'sub', 'mul', 'divi'] # List of the names of our maths functions
size = 4
tups = [('scalars', 'add', 'sc0'), ('scalars', 'add', 'sc1'), ('scalars', 'add', 'sc2'), ('scalars', 'sub', 'sc0'), ('scalars', 'sub', 'sc1'), ('scalars', 'mul', 'sc0'), ('scalars', 'mul', 'sc1'), ('scalars', 'mul', 'sc2'), ('scalars', 'divi', 'sc0'), ('scalars', 'divi', 'sc1'), ('operator', '', '')]
df = pd.DataFrame(columns=pd.MultiIndex.from_tuples(tups))
df['operatorIndex'] = np.random.randint(0, len(names), size)
df['operator'] = df['operatorIndex'].apply(lambda x: str(names[x]))
groupSize = 2
df['ID']=np.divmod(np.arange(len(df)),groupSize)[0] 1
df.set_index('ID', inplace=True)
df.sort_index(inplace=True)
for name in names:
df.loc[(df['operator'] == name), ('scalars', name, df.columns.levels[2])] = np.random.randint(0, 10)
> df
scalars operator operatorIndex
add sub mul divi
sc0 sc1 sc2 sc0 sc1 sc0 sc1 sc2 sc0 sc1
ID
1 4 4 4 NaN NaN NaN NaN NaN NaN NaN add 0
1 NaN NaN NaN 7 7 NaN NaN NaN NaN NaN sub 1
2 NaN NaN NaN NaN NaN 3 3 3 NaN NaN mul 2
2 NaN NaN NaN 7 7 NaN NaN NaN NaN NaN sub 1
How can I create a new column called Evaluation
which is the result of the correct mathematical function for each column? The column operator
tells each row which function needs to be applied.
An example goal df
would look like:
> df
scalars operator operatorIndex Evaluation
add sub mul divi
sc0 sc1 sc2 sc0 sc1 sc0 sc1 sc2 sc0 sc1
ID
1 4 5 6 NaN NaN NaN NaN NaN NaN NaN add 0 15
1 NaN NaN NaN 7 2 NaN NaN NaN NaN NaN sub 1 5
2 NaN NaN NaN NaN NaN 1 2 3 NaN NaN mul 2 6
2 NaN NaN NaN 5 9 NaN NaN NaN NaN NaN sub 1 -4
CodePudding user response:
This like this will work:
df['Evaluation'] = df['operatorIndex'].apply(lambda operatorIndex: operatorList[operatorIndex](df['scalars'][names[operatorIndex]].min()))
Although it will only work if each operator has only one entry - if an op has multiple rows in the df, it will produce strange results.
You can fix that by storing a global counter that updates for each row, and using iloc
to access the row of scalars by index:
rowCounter = -1
def func(operatorIndex):
global rowCounter
rowCounter = 1
return operatorList[operatorIndex](df['scalars'][names[operatorIndex]].iloc[rowCounter])
df['Evaluation'] = df['operatorIndex'].apply(func)
CodePudding user response:
Managed to adapt the answer from @user17242583
df['Evaluation'] = df.apply(lambda row: operatorList[row['operatorIndex'].values[0]](row['scalars'][names[row['operatorIndex'].values[0]]].to_numpy()) , axis=1 )
I have no idea if this will be useful to anyone else, but here you go.