I have a df called df_out with column names such as this in the following insert but I cannot for some reason use 'groupby' function with the column headers since it keeps giving me KeyError: 'year'. I"ve researched and tried stripping white space, resetting the index, allowing white space before my groupby setting, etc and I cannot get past this KeyError. The df_out looks like this:
df_out.columns
Out[185]:
Index(['year', 'month', 'BARTON CHAPEL', 'BARTON I', 'BIG HORN I',
'BLUE CREEK', 'BUFFALO RIDGE I', 'CAYUGA RIDGE', 'COLORADO GREEN',
'DESERT WIND', 'DRY LAKE I', 'EL CABO', 'GROTON', 'NEW HARVEST',
'PENASCAL I', 'RUGBY', 'TULE'],
dtype='object', name='plant_name')
But, when I use df_out.head(), I get a different answer with the leading column of 'plant_name' so this maybe is where the error is coming from or related. Here is the output columns from -
df_out.head()
Out[187]:
plant_name year month BARTON CHAPEL BARTON I BIG HORN I BLUE CREEK \
0 1991 1 6.432285 7.324126 5.170067 6.736384
1 1991 2 7.121324 6.973586 4.922693 7.473527
2 1991 3 8.125793 8.681317 5.796599 8.401855
3 1991 4 7.454972 8.037764 7.272292 7.961625
4 1991 5 7.012809 6.530013 6.626949 6.009825
plant_name BUFFALO RIDGE I CAYUGA RIDGE COLORADO GREEN DESERT WIND \
0 7.163790 7.145323 5.783629 5.682003
1 7.595744 7.724717 6.245952 6.269524
2 8.111411 9.626075 7.918871 6.657648
3 8.807458 8.618806 7.011444 5.848736
4 7.734852 6.267097 7.410013 5.099610
plant_name DRY LAKE I EL CABO GROTON NEW HARVEST PENASCAL I \
0 4.721089 10.747285 7.456640 6.921801 6.296425
1 5.095923 8.891057 7.239762 7.449122 6.484241
2 8.409637 12.238508 8.274046 8.824758 8.444960
3 7.893694 10.837139 6.381736 8.840431 7.282444
4 8.496976 8.636882 6.856747 7.469825 7.999530
plant_name RUGBY TULE
0 7.028360 4.110605
1 6.394687 5.257128
2 6.859462 10.789516
3 7.590153 7.425153
4 7.556546 8.085255
My groupby statement that is getting the KeyError looks like this and I'm trying to calculate the average by rows of year and month based on a subset of columns from df_out found in the list - 'west':
west=['BIG HORN I','DRY LAKE I', 'TULE']
westavg = df_out[df_out.columns[df_out.columns.isin(west)]].groupby(['year','month']).mean()
thank you very much,
CodePudding user response:
Your code can be broken down as:
westavg = (df_out[df_out.columns[df_out.columns.isin(west)]]
.groupby(['year','month']).mean()
)
which is not working because ['year','month']
are not columns of df_out[df_out.columns[df_out.columns.isin(west)]]
.
Try:
west_cols = [c for c in df_out if c in west]
westavg = df_out.groupby(['year','month'])[west_cols].mean()
CodePudding user response:
Ok, with the help of Quang Hoang below, I understood the problem and came up with this answer that works that I am able to understand a bit better using .intersection:
westavg = df_out[df_out.columns.intersection(west)].mean(axis=1)
#gives me average of each row from the subset of columns defined by the list 'west'`.