Groupby mean function producing e numbers which don't look right-CodePudding

I have the following code where I am trying to group the dataframe 'newdata' by the first column and then find the mean of the column '0_happy'.

However, the output is producing me with e numbers, which seem too big/small to be mean values.

I would be so grateful if anybody could point out where I may be going wrong?

newdatahappy = newdata.groupby(newdata.iloc[:,0])[str(0)   '_happy'].mean()

print(newdatahappy)

0_type
fullMiss    3.113534e 20
hit         1.893626e 07
nearMiss    4.149066e 13
Name: 0_happy, dtype: float64

The first 10 rows of newdata:

        0_type 0_happy 0_motive
0        NaN     NaN      NaN
1        NaN     NaN      NaN
2        NaN     NaN      NaN
3        NaN     NaN      NaN
4        NaN     NaN      NaN
5   fullMiss      37       66
6   nearMiss      33       67
7        hit      75       60
8   fullMiss      36       63
9        hit      74       42
10  nearMiss      19       45

CodePudding user response：

I tried to test your code on the first 10 entries. As expected it seems to work fine.
I would suggest using it like with the column. Like this:

# to make it easier for other folks trying to check this out
NaN = np.nan
data = [[NaN,NaN,NaN],
[NaN,NaN,NaN],
[NaN,NaN,NaN],
[NaN,NaN,NaN],
[NaN,NaN,NaN],
["fullMiss",37,66],
["nearMiss",33,67],
["hit",75,60],
["fullMiss",36,63],
["hit",74,42],
["nearMiss",19,45]]
newdata = pd.DataFrame(data, columns=['0_type', '0_happy', '0_motive'])

# you can use the column name directly in groupy
newdata.groupby("0_type")[str(0)   '_happy'].mean()

But that shouldn't affect the results.

You could also use a pivot table like this:

newdata.pivot_table(index='0_type', values='0_happy',aggfunc="mean")

Result:

0_type
fullMiss    36.5
hit         74.5
nearMiss    26.0
Name: 0_happy, dtype: float64

Anyway, I would suggest you check your actual input data to check you don't have extreme outliers for example.

To start, check the maximum values for example:

newdata.pivot_table(index='0_type', values='0_happy',aggfunc="max")

CodePudding user response：

change your float format option

pd.options.display.float_format = '{:.2f}'.format