Subtract pandas dataframe using list to choose diferent columns in each iteration-CodePudding

I'm trying to use a list as an index in a Dataframe subtract operation. However I get the following error: cannot do positional indexing on Index with these indexers

I have these two DataFrames:

df1:

index	t1	t2	t3	t4	t5	...	t950
a,1	0,00001	0,00002	0,00003	0,00004	0,00008	...	0,00004
a,2	0,00001	0,00002	0,00003	0,00005	0,00007	...	0,00004
b,1	0,00004	0,00003	0,00002	0,00006	0,00006	...	0,00001
b,2	0,00005	0,00004	0,00003	0,00007	0,00005	...	0,00002

df2:

index	t1	t2	t3	t4	t5	...	t950
a,1	0,00008	0,00007	0,00007	0,00006	0,00004	...	0,00002
a,2	0,00007	0,00006	0,00005	0,00004	0,00003	...	0,00002
b,1	0,00002	0,00001	0,00002	0,00003	0,00004	...	0,00004
b,2	0,00005	0,00006	0,00007	0,00008	0,00009	...	0,00004

And I have a list too which includes the index for each column from where the subtract should start:

index_col

[2,3,1,2]

My code nowadays is as follows:

result=df1.subtract(df2.iloc[:,index_col:].rename(columns=dict(zip(df2.iloc[:,index_col:].columns,df2.columns))

My expected result is:

index	t1	t2	t3	t4	t5	...	t950
a,1	-0,00006	-0,00004	-0,00001	...	...	...	0,00002
a,2	-0,00003	-0,00001	...	...	...	...	0,00002
b,1	-0,00003	-0,00001	0,00001	-0,00002	...	...	0,00004
b,2	-0,00002	-0,00004	-0,00006	...	...	...	0,00004

Where, for example, in the first row:

t1 - t3 // t2 - t4 // t3 - t5 because in df2 it should start in the third column (as the first index_col value reflects).

Do you know how I can calculate this subtract following my list as a column index ? I know I can do this with a loop but I want to try to avoid it and use the power of vectorization.

Thanks you so much!

CodePudding user response：

Assuming you want to subtract 0 when the row runs out.

Let shifts be the list [2,3,1,2], what you call index_col.

Might not be the nicest/most elegant solution, but I think this will do what you want:

import numpy as np
from scipy.ndimage import shift

shifted = np.vstack([shift(row, -k) for row, k in zip(df2.values, shifts)])
result = df1 - shifted

Full example:

import numpy as np
import pandas as pd
from scipy.ndimage import shift


df1 = pd.DataFrame(np.random.random((4, 5)), index=list("ABCD"), columns=list("abcde"))
df2 = pd.DataFrame(np.random.random((4, 5)))
shifts = [2, 3, 1, 2]

>>> df1
          a         b         c         d         e
A  0.308420  0.591043  0.914204  0.407474  0.670670
B  0.371686  0.989710  0.823255  0.145337  0.437014
C  0.023323  0.590014  0.685122  0.558222  0.238016
D  0.996939  0.974608  0.117192  0.539702  0.622569

>>> df2
          0         1         2         3         4
0  0.825861  0.356216  0.668553  0.090403  0.915279
1  0.699088  0.239835  0.728388  0.143411  0.025266
2  0.600219  0.280567  0.267376  0.100538  0.511053
3  0.538059  0.705963  0.633112  0.138550  0.455539

>>> shifted = np.vstack([shift(row, -k) for row, k in zip(df2.values, shifts)])
>>> df1 - shifted
          a         b         c         d         e
A -0.360133  0.500640 -0.001074  0.407474  0.670670
B  0.228275  0.964444  0.823255  0.145337  0.437014
C -0.257244  0.322639  0.584583  0.047168  0.238016
D  0.363827  0.836059 -0.338347  0.539702  0.622569

If you don't want to import scipy, you can write your own shift function: see Shift elements in a numpy array.