Home > Software design >  How to evaluate a formula based on different values of one colume in Python Dataframe
How to evaluate a formula based on different values of one colume in Python Dataframe

Time:07-07

I have a dataframe and I want to evaluate a formula such like result = 2*apple - melon - orange and group by testid.

my df is like below

df = pd.DataFrame({'testid':(1,2,1,2,1,2),'Name':('apple','apple','melon','melon','orange','orange'), 'A': (1,2,10,20,5,5), 'B': (1,5,4,2,3,1)})
testid Name A B
1 apple 1 1
2 apple 2 5
1 melon 10 4
2 melon 20 2
1 orange 5 3
2 orange 5 1

and I want my result as for each testid's column A and B, do apple * 2 - melon - orange. and then save into new column result_A and result_B, group by testid

testid result_A result_B
1 -13 -5
2 -21 7

which function can get this result?

I tried to use df.eval() like

eq = 'df[df.Name==\'apple\',[\'A\',\'B\']] *2 - df[df.Name==\'melon\',[\'A\',\'B\']] - df[df.Name==\'orange\',[\'A\',\'B\']]'
df.eval(eq)

but it tells me

UndefinedVariableError: name 'df' is not defined

i realized i should use pd.eval(engine='python') instead of df.eval() and now the problem is that different subset of df return different index which result them cannot add or minus

df[df.Name=='apple'][['A','B']]*2

this gives me

A B
0 2 2
1 4 10

however

df[df.Name=='melon'][['A','B']]

this gives me

A B
2 10 4
3 20 2

so they can't add or minus together because index not match

CodePudding user response:

Try this

(
    df
    .pivot('testid', 'Name', ['A', 'B'])# reshape df into a wide multiindex df
    .stack(0)                           # remove multiindex
    .eval('apple*2-melon-orange')       # apply formula (creates a Series)
    .unstack()                          # convert to df
    .add_prefix('result_')              # add prefix to column names
    .reset_index()
)

enter image description here

CodePudding user response:

You can define a method first (assuming there is only one apple, melon, orange for each test id)

def compute(g):
  a = g.loc[g['Name']=='apple'].iloc[0]
  m = g.loc[g['Name']=='melon'].iloc[0]
  o = g.loc[g['Name']=='orange'].iloc[0]
  
  return pd.DataFrame({'A':[a['A']*2 - m['A'] - o['A']],
                       'B':[a['B']*2 - m['B'] - o['B']]}, index=pd.Index([0]))

Then apply on the group like:

   df = df.groupby('testid').apply(compute).reset_index(level=1, drop=True).reset_index()

print(df):

   testid   A  B
0       1 -13 -5
1       2 -21  7
  • Related