Home > OS >  Pandas Avoid Multidimensional Key Error Comparing 2 Dataframes
Pandas Avoid Multidimensional Key Error Comparing 2 Dataframes

Time:02-19

I am stuck on a multidimensional key value error. I have a datframe that looks like this:

    year      RMSE  index  cyear  Corr_to_CY
0   2000  0.279795      5   1997    0.997975
1   2011  0.299011      2   1994    0.997792
2   2003  0.368341      1   1993    0.977143
3   2013  0.377902     23   2015    0.824441
4   1999   0.41495     10   2002    0.804633
5   1997  0.435813      8   2000    0.752724
6   2018  0.491003     24   2016    0.703359
7   2002  0.505771      3   1995    0.684926
8   2009  0.529308     17   2009    0.580481
9   2015  0.584146     27   2019    0.556555
10  2004  0.620946     26   2018    0.500790
11  2016  0.659388     22   2014    0.443543
12  1993  0.700942     19   2011    0.431615
13  2006  0.748086     11   2003    0.375111
14  2007  0.766675     21   2013    0.323143
15  2020  0.827913     12   2004    0.149202
16  2014  0.884109      7   1999    0.002438
17  2012  0.900184      0   1992   -0.351615
18  1995  0.919482     28   2020   -0.448915
19  1992  0.930512     20   2012   -0.563762
20  2001  0.967834     18   2010   -0.613170
21  2019   1.00497      9   2001   -0.677590
22  2005   1.00885     13   2005   -0.695690
23  2010  1.159125     14   2006   -0.843122
24  2017  1.173262     15   2007   -0.931034
25  1994  1.179737      6   1998   -0.939697
26  2008  1.212915     25   2017   -0.981626
27  1996  1.308853     16   2008   -0.985893
28  1998  1.396771      4   1996   -0.999990

I have selected the conditions for column values of 'Corr_to_CY' >= 0.70 and to return values of 'cyear' column into a new df called 'cyears'. I need to use this as an index to find the year and RMSE value where the 'year' column is in cyears df. This is my best attempt and I get the value error: cannot index with multidimensional key. Do I need to change the index df "cyears" to something else - series, list, etc for this to work? thank you and here is my code that produces the error:

cyears = comp.loc[comp['Corr_to_CY']>= 0.7,'cyear']
cyears = cyears.to_frame()
result = comp.loc[comp['year'] == cyears,'RMSE']

ValueError: Cannot index with multidimensional key

CodePudding user response:

You can use isin method:

import pandas as pd

# Sample creation
import io
comp = pd.read_csv(io.StringIO('year,RMSE,index,cyear,Corr_to_CY\n2000,0.279795,5,1997,0.997975\n2011,0.299011,2,1994,0.997792\n2003,0.368341,1,1993,0.977143\n2013,0.377902,23,2015,0.824441\n1999,0.41495,10,2002,0.804633\n1997,0.435813,8,2000,0.752724\n2018,0.491003,24,2016,0.703359\n2002,0.505771,3,1995,0.684926\n2009,0.529308,17,2009,0.580481\n2015,0.584146,27,2019,0.556555\n2004,0.620946,26,2018,0.500790\n2016,0.659388,22,2014,0.443543\n1993,0.700942,19,2011,0.431615\n2006,0.748086,11,2003,0.375111\n2007,0.766675,21,2013,0.323143\n2020,0.827913,12,2004,0.149202\n2014,0.884109,7,1999,0.002438\n2012,0.900184,0,1992,-0.351615\n1995,0.919482,28,2020,-0.448915\n1992,0.930512,20,2012,-0.563762\n2001,0.967834,18,2010,-0.613170\n2019,1.00497,9,2001,-0.677590\n2005,1.00885,13,2005,-0.695690\n2010,1.159125,14,2006,-0.843122\n2017,1.173262,15,2007,-0.931034\n1994,1.179737,6,1998,-0.939697\n2008,1.212915,25,2017,-0.981626\n1996,1.308853,16,2008,-0.985893\n1998,1.396771,4,1996,-0.999990\n'))

# Operations
cyears = comp.loc[comp['Corr_to_CY']>= 0.7,'cyear']
result = comp.loc[comp['year'].isin(cyears),'RMSE']

If you want to keep cyears as pandas DataFrame instead of Series, try the following:

# Operations
cyears = comp.loc[comp['Corr_to_CY']>= 0.7, ['cyear']]
result = comp.loc[comp['year'].isin(cyears.cyear),'RMSE']
  • Related