I have a df
:
id A1 A2 a b c d
1 50 300 40 0 25 50
2 100 50 50 100 30 100
This data represents a warehouse, A1
is current items in the warehouse, A2
is how many items is shipped each month, a, b, c, d
represents months.
I am trying to apply such logic where I calculate stock on each months b, c, d
:
id A1 A2 a b c d stock_on_b stock_on_c stock_on_d
1 50 300 40 0 25 50 A1-A2 a b A1-A2 a b c A1-A2 a b c d
2 100 50 50 100 30 100 A1-A2 a b A1-A2 a b c A1-A2 a b c d
I wrote a function for each month, which looks like this:
# stock left on b
def b_stock(x):
return 0 if x['A2'] > (x['A1'] x.loc[:,'a':'b'].sum(axis=1))
else x['A1'] x.loc[:,'a':'b'].sum(axis=1))
df['stock_on_b'] = df.apply(b_stock, axis=1)
But I get an error:
IndexingError: Too many indexers
I think this is because I am trying to use .loc
on a row, because .apply
works on rows and not the whole df
but I am unsure, how could I apply this logic using column name range instead of writing every single column name in the equation like this
return 0 if x['A2'] > (x['A1'] x['A1'] x['a'] x['b'] x['c'])
else x['A1'] x['A1'] x['a'] x['b'] x['c'])
CodePudding user response:
Vectorized solution with numpy.where
:
s = df.loc[:,'a':'b'].sum(axis=1)
m = df['A2'] > df['A1'] s
df['stock_on_b'] = np.where(m, 0, df['A1'] s)
Apply alternative (slowier, because processing per rows):
def b_stock(x):
#x is Series (row, because apply with axis=1)
#print (x)
v = x.loc['a':'b'].sum()
return 0 if x['A2'] > (x['A1'] v) else x['A1'] v
df['stock_on_b1'] = df.apply(b_stock, axis=1)