Home > front end >  Subtract pandas dataframe using list to choose diferent columns in each iteration
Subtract pandas dataframe using list to choose diferent columns in each iteration

Time:05-18

I'm trying to use a list as an index in a Dataframe subtract operation. However I get the following error: cannot do positional indexing on Index with these indexers

I have these two DataFrames:

df1:

index t1 t2 t3 t4 t5 ... t950
a,1 0,00001 0,00002 0,00003 0,00004 0,00008 ... 0,00004
a,2 0,00001 0,00002 0,00003 0,00005 0,00007 ... 0,00004
b,1 0,00004 0,00003 0,00002 0,00006 0,00006 ... 0,00001
b,2 0,00005 0,00004 0,00003 0,00007 0,00005 ... 0,00002

df2:

index t1 t2 t3 t4 t5 ... t950
a,1 0,00008 0,00007 0,00007 0,00006 0,00004 ... 0,00002
a,2 0,00007 0,00006 0,00005 0,00004 0,00003 ... 0,00002
b,1 0,00002 0,00001 0,00002 0,00003 0,00004 ... 0,00004
b,2 0,00005 0,00006 0,00007 0,00008 0,00009 ... 0,00004

And I have a list too which includes the index for each column from where the subtract should start:

index_col

[2,3,1,2]

My code nowadays is as follows:

result=df1.subtract(df2.iloc[:,index_col:].rename(columns=dict(zip(df2.iloc[:,index_col:].columns,df2.columns))

My expected result is:

index t1 t2 t3 t4 t5 ... t950
a,1 -0,00006 -0,00004 -0,00001 ... ... ... 0,00002
a,2 -0,00003 -0,00001 ... ... ... ... 0,00002
b,1 -0,00003 -0,00001 0,00001 -0,00002 ... ... 0,00004
b,2 -0,00002 -0,00004 -0,00006 ... ... ... 0,00004

Where, for example, in the first row:

t1 - t3 // t2 - t4 // t3 - t5 because in df2 it should start in the third column (as the first index_col value reflects).

Do you know how I can calculate this subtract following my list as a column index ? I know I can do this with a loop but I want to try to avoid it and use the power of vectorization.

Thanks you so much!

CodePudding user response:

Assuming you want to subtract 0 when the row runs out.

Let shifts be the list [2,3,1,2], what you call index_col.

Might not be the nicest/most elegant solution, but I think this will do what you want:

import numpy as np
from scipy.ndimage import shift

shifted = np.vstack([shift(row, -k) for row, k in zip(df2.values, shifts)])
result = df1 - shifted

Full example:

import numpy as np
import pandas as pd
from scipy.ndimage import shift


df1 = pd.DataFrame(np.random.random((4, 5)), index=list("ABCD"), columns=list("abcde"))
df2 = pd.DataFrame(np.random.random((4, 5)))
shifts = [2, 3, 1, 2]
>>> df1
          a         b         c         d         e
A  0.308420  0.591043  0.914204  0.407474  0.670670
B  0.371686  0.989710  0.823255  0.145337  0.437014
C  0.023323  0.590014  0.685122  0.558222  0.238016
D  0.996939  0.974608  0.117192  0.539702  0.622569

>>> df2
          0         1         2         3         4
0  0.825861  0.356216  0.668553  0.090403  0.915279
1  0.699088  0.239835  0.728388  0.143411  0.025266
2  0.600219  0.280567  0.267376  0.100538  0.511053
3  0.538059  0.705963  0.633112  0.138550  0.455539

>>> shifted = np.vstack([shift(row, -k) for row, k in zip(df2.values, shifts)])
>>> df1 - shifted
          a         b         c         d         e
A -0.360133  0.500640 -0.001074  0.407474  0.670670
B  0.228275  0.964444  0.823255  0.145337  0.437014
C -0.257244  0.322639  0.584583  0.047168  0.238016
D  0.363827  0.836059 -0.338347  0.539702  0.622569

If you don't want to import scipy, you can write your own shift function: see Shift elements in a numpy array.

  • Related