Home > OS >  Pandas applying with multi-indexed columns
Pandas applying with multi-indexed columns

Time:11-22

I have the following working code that creates some simple maths functions and a DataFrame with multi-indexed columns. I would like to conditionally apply some functions over it.

import pandas as pd
import numpy as np

# Simple maths functions  ---
def add(scalars): return sum(scalars)  #Adds every element in scalars
def sub(scalars): return (scalars[0] - sum(scalars[1:])) #First scalar value subtracted by the rest of the scalar list
def mul(scalars): return np.prod(scalars) #Multiplies every element in scalars
def divi(scalars): return (scalars[0] / np.prod(scalars[1:])) #First scalar value divided by the rest product of the scalar list

# Create df ---

operatorList = [add, sub, mul, divi]  # List of our maths functions
names = ['add', 'sub', 'mul', 'divi'] # List of the names of our maths functions
size = 4

tups = [('scalars', 'add', 'sc0'), ('scalars', 'add', 'sc1'), ('scalars', 'add', 'sc2'), ('scalars', 'sub', 'sc0'), ('scalars', 'sub', 'sc1'), ('scalars', 'mul', 'sc0'), ('scalars', 'mul', 'sc1'), ('scalars', 'mul', 'sc2'), ('scalars', 'divi', 'sc0'), ('scalars', 'divi', 'sc1'), ('operator', '', '')]
df = pd.DataFrame(columns=pd.MultiIndex.from_tuples(tups))

df['operatorIndex'] = np.random.randint(0, len(names), size)
df['operator'] = df['operatorIndex'].apply(lambda x: str(names[x]))

groupSize = 2
df['ID']=np.divmod(np.arange(len(df)),groupSize)[0] 1
df.set_index('ID', inplace=True)
df.sort_index(inplace=True)

for name in names:
  df.loc[(df['operator'] == name), ('scalars', name, df.columns.levels[2])] = np.random.randint(0, 10)
> df

    scalars                                   operator  operatorIndex
    add         sub     mul         divi        
    sc0 sc1 sc2 sc0 sc1 sc0 sc1 sc2 sc0 sc1     
ID                                              
1   4   4   4   NaN NaN NaN NaN NaN NaN NaN   add       0
1   NaN NaN NaN 7   7   NaN NaN NaN NaN NaN   sub       1
2   NaN NaN NaN NaN NaN 3   3   3   NaN NaN   mul       2
2   NaN NaN NaN 7   7   NaN NaN NaN NaN NaN   sub       1

How can I create a new column called Evaluation which is the result of the correct mathematical function for each column? The column operator tells each row which function needs to be applied.

An example goal df would look like:

> df

    scalars                                   operator  operatorIndex  Evaluation
    add         sub     mul         divi        
    sc0 sc1 sc2 sc0 sc1 sc0 sc1 sc2 sc0 sc1     
ID                                              
1   4   5   6   NaN NaN NaN NaN NaN NaN NaN   add       0              15
1   NaN NaN NaN 7   2   NaN NaN NaN NaN NaN   sub       1              5
2   NaN NaN NaN NaN NaN 1   2   3   NaN NaN   mul       2              6
2   NaN NaN NaN 5   9   NaN NaN NaN NaN NaN   sub       1              -4

CodePudding user response:

This like this will work:

df['Evaluation'] = df['operatorIndex'].apply(lambda operatorIndex: operatorList[operatorIndex](df['scalars'][names[operatorIndex]].min()))

Although it will only work if each operator has only one entry - if an op has multiple rows in the df, it will produce strange results.

You can fix that by storing a global counter that updates for each row, and using iloc to access the row of scalars by index:

rowCounter = -1
def func(operatorIndex):
    global rowCounter
    rowCounter  = 1
    return operatorList[operatorIndex](df['scalars'][names[operatorIndex]].iloc[rowCounter])

df['Evaluation'] = df['operatorIndex'].apply(func)

CodePudding user response:

Managed to adapt the answer from @user17242583

df['Evaluation'] = df.apply(lambda row: operatorList[row['operatorIndex'].values[0]](row['scalars'][names[row['operatorIndex'].values[0]]].to_numpy()) , axis=1 )

I have no idea if this will be useful to anyone else, but here you go.

  • Related