Home > database >  Difference between histogram and pandas value_count()
Difference between histogram and pandas value_count()

Time:11-26

I suppose both the pandas value_counts() and histogram gives the frequency of an item. I have a case where this is different. When I plot a histogram, I get two peaks as shown below,

d = pd.read_csv('sample.csv')
d.hist()
d['value'].value_counts().nlargest(3)


200000000.0    906
20.0           219
10.0           158
Name: value, dtype: int64

enter image description here

But when I use value_counts(), I only get the value 200000000 as the most occurring one, but instead it should be something around 0.02. Can someone explain what exactly happens here. The sample data that I used is enter image description here

The (approximate) equivalent using a enter image description here

Output of pd.cut(df['value'], bins=10).value_counts(sort=False):

(-199999.996, 20000000.004]       1523
(20000000.004, 40000000.003]         5
(40000000.003, 60000000.003]         9
(60000000.003, 80000000.002]         5
(80000000.002, 100000000.002]        0
(100000000.002, 120000000.002]       8
(120000000.002, 140000000.001]       0
(140000000.001, 160000000.001]       0
(160000000.001, 180000000.0]         0
(180000000.0, 200000000.0]         906
Name: value, dtype: int64

CodePudding user response:

they are the same thing if you checked the csv file you will find that 200000000.0 is exactly 906 and that is what they both showing but in the histogram they used apprev to the numbers 1e8

  • Related