Summing values based on range of column names using indexing when using apply in pandas-CodePudding

I have a df:

id    A1        A2        a    b    c    d
1     50        300       40   0    25   50
2     100       50        50   100  30   100

This data represents a warehouse, A1 is current items in the warehouse, A2 is how many items is shipped each month, a, b, c, d represents months.

I am trying to apply such logic where I calculate stock on each months b, c, d:

id    A1        A2        a    b    c    d         stock_on_b      stock_on_c      stock_on_d    
1     50        300       40   0    25   50        A1-A2 a b       A1-A2 a b c     A1-A2 a b c d    
2     100       50        50   100  30   100       A1-A2 a b       A1-A2 a b c     A1-A2 a b c d

I wrote a function for each month, which looks like this:

# stock left on b
def b_stock(x):

    return 0 if x['A2'] > (x['A1']    x.loc[:,'a':'b'].sum(axis=1)) 
             else x['A1']   x.loc[:,'a':'b'].sum(axis=1))

df['stock_on_b'] = df.apply(b_stock, axis=1)

But I get an error:

IndexingError: Too many indexers

I think this is because I am trying to use .loc on a row, because .apply works on rows and not the whole df but I am unsure, how could I apply this logic using column name range instead of writing every single column name in the equation like this

return 0 if x['A2'] > (x['A1']    x['A1']   x['a']   x['b']   x['c']) 
             else x['A1']   x['A1']   x['a']   x['b']   x['c'])

CodePudding user response：

Vectorized solution with numpy.where:

s = df.loc[:,'a':'b'].sum(axis=1)
m = df['A2'] > df['A1']   s 

df['stock_on_b'] = np.where(m, 0, df['A1']   s)

Apply alternative (slowier, because processing per rows):

def b_stock(x):
    #x is Series (row, because apply with axis=1)
    #print (x)
    v = x.loc['a':'b'].sum()
    return 0 if x['A2'] > (x['A1']   v) else x['A1']   v

df['stock_on_b1'] = df.apply(b_stock, axis=1)