Home > Back-end >  Pandas Groupby Multiple Conditions KeyError
Pandas Groupby Multiple Conditions KeyError

Time:09-22

I have a df called df_out with column names such as this in the following insert but I cannot for some reason use 'groupby' function with the column headers since it keeps giving me KeyError: 'year'. I"ve researched and tried stripping white space, resetting the index, allowing white space before my groupby setting, etc and I cannot get past this KeyError. The df_out looks like this:

df_out.columns
Out[185]: 
Index(['year', 'month', 'BARTON CHAPEL', 'BARTON I', 'BIG HORN I',
       'BLUE CREEK', 'BUFFALO RIDGE I', 'CAYUGA RIDGE', 'COLORADO GREEN',
       'DESERT WIND', 'DRY LAKE I', 'EL CABO', 'GROTON', 'NEW HARVEST',
       'PENASCAL I', 'RUGBY', 'TULE'],
      dtype='object', name='plant_name')

But, when I use df_out.head(), I get a different answer with the leading column of 'plant_name' so this maybe is where the error is coming from or related. Here is the output columns from -

df_out.head()
Out[187]: 
plant_name  year  month  BARTON CHAPEL  BARTON I  BIG HORN I  BLUE CREEK  \
0           1991      1       6.432285  7.324126    5.170067    6.736384   
1           1991      2       7.121324  6.973586    4.922693    7.473527   
2           1991      3       8.125793  8.681317    5.796599    8.401855   
3           1991      4       7.454972  8.037764    7.272292    7.961625   
4           1991      5       7.012809  6.530013    6.626949    6.009825   

plant_name  BUFFALO RIDGE I  CAYUGA RIDGE  COLORADO GREEN  DESERT WIND  \
0                  7.163790      7.145323        5.783629     5.682003   
1                  7.595744      7.724717        6.245952     6.269524   
2                  8.111411      9.626075        7.918871     6.657648   
3                  8.807458      8.618806        7.011444     5.848736   
4                  7.734852      6.267097        7.410013     5.099610   

plant_name  DRY LAKE I    EL CABO    GROTON  NEW HARVEST  PENASCAL I  \
0             4.721089  10.747285  7.456640     6.921801    6.296425   
1             5.095923   8.891057  7.239762     7.449122    6.484241   
2             8.409637  12.238508  8.274046     8.824758    8.444960   
3             7.893694  10.837139  6.381736     8.840431    7.282444   
4             8.496976   8.636882  6.856747     7.469825    7.999530   

plant_name     RUGBY       TULE  
0           7.028360   4.110605  
1           6.394687   5.257128  
2           6.859462  10.789516  
3           7.590153   7.425153  
4           7.556546   8.085255 

My groupby statement that is getting the KeyError looks like this and I'm trying to calculate the average by rows of year and month based on a subset of columns from df_out found in the list - 'west':

west=['BIG HORN I','DRY LAKE I', 'TULE']
westavg = df_out[df_out.columns[df_out.columns.isin(west)]].groupby(['year','month']).mean()

thank you very much,

CodePudding user response:

Your code can be broken down as:

westavg =  (df_out[df_out.columns[df_out.columns.isin(west)]]
                 .groupby(['year','month']).mean()
           )

which is not working because ['year','month'] are not columns of df_out[df_out.columns[df_out.columns.isin(west)]].

Try:

west_cols = [c for c in df_out if c in west]
westavg = df_out.groupby(['year','month'])[west_cols].mean()

CodePudding user response:

Ok, with the help of Quang Hoang below, I understood the problem and came up with this answer that works that I am able to understand a bit better using .intersection:

westavg = df_out[df_out.columns.intersection(west)].mean(axis=1)

#gives me average of each row from the subset of columns defined by the list 'west'`.

  • Related