Home > Mobile >  python pandas version of Excel SUM OFFSET function
python pandas version of Excel SUM OFFSET function

Time:03-11

Let's say input was

d = {'col1': [0,0,0,0,0,0,0,0,0,0],
     'col2': [1,1,1,1,1,1,1,1,1,1],
     'col3': [2,2,2,2,2,2,2,2,2,2],
     'col4': [3,3,3,3,3,3,3,3,3,3],
     'col5': [4,4,4,4,4,4,4,4,4,4],
     'col6': [5,5,5,5,5,5,5,5,5,5],
     'c_off': [0,1,2,3,0,1,2,3,0,1]} 
df = pd.DataFrame(data=d)
ref_width = 2

I want to get an output column like this:

df['output'] = [1, 3, 5, 7, 1, 3, 5, 7, 1, 3]

Where the ref_width is the number of columns to sum over and c_off is how many columns to the right of col1 to start summing over.

For example, when c_off is 0, that's just summing first 2 columns col1 and col2. When c_off is 1, that's summing col2 and col3. When it's 3, sum col4 and col5.

Is there a vectorized way to do this?

CodePudding user response:

Here's a vectorized solution that uses numpy broadcasting:

cols_to_process = 6

x = np.arange(cols_to_process) == df['c_off'].to_numpy()[:, None]
x  = np.add.reduce([np.pad(x, ((0,0),(i,0)))[:, :-i] for i in range(1,ref_width)]).astype(bool)

df['output'] = df.iloc[:, :cols_to_process].to_numpy()[x].reshape(-1, ref_width).sum(axis=1)

Output:

>>> df
   col1  col2  col3  col4  col5  col6  c_off  output
0     0     1     2     3     4     5      0       1
1     0     1     2     3     4     5      1       3
2     0     1     2     3     4     5      2       5
3     0     1     2     3     4     5      3       7
4     0     1     2     3     4     5      0       1
5     0     1     2     3     4     5      1       3
6     0     1     2     3     4     5      2       5
7     0     1     2     3     4     5      3       7
8     0     1     2     3     4     5      0       1
9     0     1     2     3     4     5      1       3

CodePudding user response:

try this:

df['output'] = df.agg(np.array, 1).apply(lambda x: x[x[-1]:-1][:ref_width].sum())
print(df)
  • Related