I have the following code where I am trying to group the dataframe 'newdata' by the first column and then find the mean of the column '0_happy'.
However, the output is producing me with e numbers, which seem too big/small to be mean values.
I would be so grateful if anybody could point out where I may be going wrong?
newdatahappy = newdata.groupby(newdata.iloc[:,0])[str(0) '_happy'].mean()
print(newdatahappy)
0_type
fullMiss 3.113534e 20
hit 1.893626e 07
nearMiss 4.149066e 13
Name: 0_happy, dtype: float64
The first 10 rows of newdata:
0_type 0_happy 0_motive
0 NaN NaN NaN
1 NaN NaN NaN
2 NaN NaN NaN
3 NaN NaN NaN
4 NaN NaN NaN
5 fullMiss 37 66
6 nearMiss 33 67
7 hit 75 60
8 fullMiss 36 63
9 hit 74 42
10 nearMiss 19 45
CodePudding user response:
I tried to test your code on the first 10 entries. As expected it seems to work fine.
I would suggest using it like with the column. Like this:
# to make it easier for other folks trying to check this out
NaN = np.nan
data = [[NaN,NaN,NaN],
[NaN,NaN,NaN],
[NaN,NaN,NaN],
[NaN,NaN,NaN],
[NaN,NaN,NaN],
["fullMiss",37,66],
["nearMiss",33,67],
["hit",75,60],
["fullMiss",36,63],
["hit",74,42],
["nearMiss",19,45]]
newdata = pd.DataFrame(data, columns=['0_type', '0_happy', '0_motive'])
# you can use the column name directly in groupy
newdata.groupby("0_type")[str(0) '_happy'].mean()
But that shouldn't affect the results.
You could also use a pivot table like this:
newdata.pivot_table(index='0_type', values='0_happy',aggfunc="mean")
Result:
0_type
fullMiss 36.5
hit 74.5
nearMiss 26.0
Name: 0_happy, dtype: float64
Anyway, I would suggest you check your actual input data to check you don't have extreme outliers for example.
To start, check the maximum values for example:
newdata.pivot_table(index='0_type', values='0_happy',aggfunc="max")
CodePudding user response:
change your float format
option
pd.options.display.float_format = '{:.2f}'.format