Let's say input was
d = {'col1': [0,0,0,0,0,0,0,0,0,0],
'col2': [1,1,1,1,1,1,1,1,1,1],
'col3': [2,2,2,2,2,2,2,2,2,2],
'col4': [3,3,3,3,3,3,3,3,3,3],
'col5': [4,4,4,4,4,4,4,4,4,4],
'col6': [5,5,5,5,5,5,5,5,5,5],
'c_off': [0,1,2,3,0,1,2,3,0,1]}
df = pd.DataFrame(data=d)
ref_width = 2
I want to get an output column like this:
df['output'] = [1, 3, 5, 7, 1, 3, 5, 7, 1, 3]
Where the ref_width
is the number of columns to sum over and c_off
is how many columns to the right of col1
to start summing over.
For example, when c_off
is 0, that's just summing first 2 columns col1
and col2
. When c_off
is 1, that's summing col2
and col3
. When it's 3, sum col4
and col5
.
Is there a vectorized way to do this?
CodePudding user response:
Here's a vectorized solution that uses numpy broadcasting:
cols_to_process = 6
x = np.arange(cols_to_process) == df['c_off'].to_numpy()[:, None]
x = np.add.reduce([np.pad(x, ((0,0),(i,0)))[:, :-i] for i in range(1,ref_width)]).astype(bool)
df['output'] = df.iloc[:, :cols_to_process].to_numpy()[x].reshape(-1, ref_width).sum(axis=1)
Output:
>>> df
col1 col2 col3 col4 col5 col6 c_off output
0 0 1 2 3 4 5 0 1
1 0 1 2 3 4 5 1 3
2 0 1 2 3 4 5 2 5
3 0 1 2 3 4 5 3 7
4 0 1 2 3 4 5 0 1
5 0 1 2 3 4 5 1 3
6 0 1 2 3 4 5 2 5
7 0 1 2 3 4 5 3 7
8 0 1 2 3 4 5 0 1
9 0 1 2 3 4 5 1 3
CodePudding user response:
try this:
df['output'] = df.agg(np.array, 1).apply(lambda x: x[x[-1]:-1][:ref_width].sum())
print(df)