I'm trying to use a list as an index in a Dataframe subtract operation. However I get the following error: cannot do positional indexing on Index with these indexers
I have these two DataFrames:
df1:
index | t1 | t2 | t3 | t4 | t5 | ... | t950 |
---|---|---|---|---|---|---|---|
a,1 | 0,00001 | 0,00002 | 0,00003 | 0,00004 | 0,00008 | ... | 0,00004 |
a,2 | 0,00001 | 0,00002 | 0,00003 | 0,00005 | 0,00007 | ... | 0,00004 |
b,1 | 0,00004 | 0,00003 | 0,00002 | 0,00006 | 0,00006 | ... | 0,00001 |
b,2 | 0,00005 | 0,00004 | 0,00003 | 0,00007 | 0,00005 | ... | 0,00002 |
df2:
index | t1 | t2 | t3 | t4 | t5 | ... | t950 |
---|---|---|---|---|---|---|---|
a,1 | 0,00008 | 0,00007 | 0,00007 | 0,00006 | 0,00004 | ... | 0,00002 |
a,2 | 0,00007 | 0,00006 | 0,00005 | 0,00004 | 0,00003 | ... | 0,00002 |
b,1 | 0,00002 | 0,00001 | 0,00002 | 0,00003 | 0,00004 | ... | 0,00004 |
b,2 | 0,00005 | 0,00006 | 0,00007 | 0,00008 | 0,00009 | ... | 0,00004 |
And I have a list too which includes the index for each column from where the subtract should start:
index_col
[2,3,1,2]
My code nowadays is as follows:
result=df1.subtract(df2.iloc[:,index_col:].rename(columns=dict(zip(df2.iloc[:,index_col:].columns,df2.columns))
My expected result is:
index | t1 | t2 | t3 | t4 | t5 | ... | t950 |
---|---|---|---|---|---|---|---|
a,1 | -0,00006 | -0,00004 | -0,00001 | ... | ... | ... | 0,00002 |
a,2 | -0,00003 | -0,00001 | ... | ... | ... | ... | 0,00002 |
b,1 | -0,00003 | -0,00001 | 0,00001 | -0,00002 | ... | ... | 0,00004 |
b,2 | -0,00002 | -0,00004 | -0,00006 | ... | ... | ... | 0,00004 |
Where, for example, in the first row:
t1 - t3 // t2 - t4 // t3 - t5 because in df2 it should start in the third column (as the first index_col value reflects).
Do you know how I can calculate this subtract following my list as a column index ? I know I can do this with a loop but I want to try to avoid it and use the power of vectorization.
Thanks you so much!
CodePudding user response:
Assuming you want to subtract 0 when the row runs out.
Let shifts
be the list [2,3,1,2]
, what you call index_col.
Might not be the nicest/most elegant solution, but I think this will do what you want:
import numpy as np
from scipy.ndimage import shift
shifted = np.vstack([shift(row, -k) for row, k in zip(df2.values, shifts)])
result = df1 - shifted
Full example:
import numpy as np
import pandas as pd
from scipy.ndimage import shift
df1 = pd.DataFrame(np.random.random((4, 5)), index=list("ABCD"), columns=list("abcde"))
df2 = pd.DataFrame(np.random.random((4, 5)))
shifts = [2, 3, 1, 2]
>>> df1
a b c d e
A 0.308420 0.591043 0.914204 0.407474 0.670670
B 0.371686 0.989710 0.823255 0.145337 0.437014
C 0.023323 0.590014 0.685122 0.558222 0.238016
D 0.996939 0.974608 0.117192 0.539702 0.622569
>>> df2
0 1 2 3 4
0 0.825861 0.356216 0.668553 0.090403 0.915279
1 0.699088 0.239835 0.728388 0.143411 0.025266
2 0.600219 0.280567 0.267376 0.100538 0.511053
3 0.538059 0.705963 0.633112 0.138550 0.455539
>>> shifted = np.vstack([shift(row, -k) for row, k in zip(df2.values, shifts)])
>>> df1 - shifted
a b c d e
A -0.360133 0.500640 -0.001074 0.407474 0.670670
B 0.228275 0.964444 0.823255 0.145337 0.437014
C -0.257244 0.322639 0.584583 0.047168 0.238016
D 0.363827 0.836059 -0.338347 0.539702 0.622569
If you don't want to import scipy
, you can write your own shift
function: see Shift elements in a numpy array.