Home > Software engineering >  How to calculate area of two intersecting histograms?
How to calculate area of two intersecting histograms?

Time:07-11

I have two datasets:

data1 = np.array([11.59877469, 12.52525819, 11.30371594,  9.09666023,  9.69178239,
       15.50122531, 16.73578474, 11.59361365,  9.33284219, 10.24323511,
       12.39877469, 14.22419388, 11.18491025,  9.38917885,  9.56598452,
       13.40122531, 13.58126828, 11.66021816,  9.90587666, 10.09661371,
       12.69877469, 13.36333954, 11.40602325,  9.04926015,  9.58116652,
       14.30122531, 13.41946488, 11.50330911,  9.70846369, 11.57496747,
       13.09877469, 15.59128195, 12.38077981,  9.24187201,  9.4187133 ,
       14.50122531, 14.00030896, 10.91137571, 10.09368741, 10.59655608,
       12.19877469, 12.66493842, 11.151608  , 10.04291564, 10.75044566,
       13.90122531, 13.80879138, 12.01213962,  9.77266764, 10.6047969 ,
       11.29877469, 11.91469995, 10.49547065,  9.54128679, 10.70684913,
       15.80122531, 15.69100383, 11.00648136, 10.26949539, 11.05984311,
       13.29877469, 14.36322429, 10.39796445,  9.15126197, 10.19866594,
       13.90122531, 13.22035631, 11.47720851, 10.43120555, 11.41316408,
       13.09877469, 12.39251915, 11.18971136,  9.9574122 , 10.25744447,
       14.20122531, 13.27219366, 11.60801697,  9.72976318, 12.07620948,
       13.69877469, 16.08363333, 12.54008941,  9.38917885, 10.24985347,
       14.20122531, 13.84544674, 11.29659942, 10.29799654, 10.78807365,
       12.39877469, 13.80385357, 11.79124697,  9.85270434, 10.47867621,
       15.30122531, 12.68375859, 11.88853282, 10.36460104, 11.52377995,
       12.09877469, 12.36280497, 11.03280231,  8.76603826, 11.57464465,
       15.70122531, 16.69782975, 11.86483278,  9.75586377, 11.00865558,
       14.39877469, 14.31138694, 10.65958136, 10.03571398,  9.75056092,
       13.30122531, 13.52878112, 12.46156728, 10.58571404, 12.04584549,
       12.69877469, 13.2230095 , 11.41802602,  9.82180264,  9.27339153,
       14.50122531, 13.09520827, 10.91617682,  9.59895472, 12.33344676,
       13.29877469, 16.05391916, 12.32137696,  9.41767999,  9.83146261,
       14.00122531, 14.13174836, 11.67702203,  9.43724456, 11.17869979,
       12.19877469, 13.48024677, 10.98060112, 10.22832584, 10.64047961,
       14.90122531, 13.34485454, 11.90053559, 10.23379258, 11.59774047,
       13.19877469, 12.95059023, 10.88549547,  9.31057157, 11.71497469,
       15.40122531, 16.38875513, 12.68748139, 10.10058357, 12.50913252,
       14.09877469, 14.37710664, 11.30162088, 10.18782192, 10.10388188,
       14.30122531, 13.33032237, 11.20389433, 10.76632313, 12.05343649,
       13.09877469, 12.8918117 , 11.89595483,  9.50078288, 10.53745474,
       14.70122531, 13.50665794, 11.46071013,  9.62505531, 12.11286483,
       14.19877469, 16.84775416, 13.15122724, 10.29252979, 10.48626721,
       14.60122531, 14.15322172, 11.65332199, 10.40960057, 12.80367496,
       13.99877469, 13.09027045, 10.75948812,  9.74799647, 11.09682545,
       14.70122531, 14.32261612, 12.15495084, 11.38675768, 12.20135753])

data2 = np.array([ 8.29102276,  8.81896403,  8.34976878,  6.42352427,  7.60717448,
       11.70897724, 12.33303488,  8.46800077,  5.99812922,  7.01110385,
        9.19102276, 10.2088251 ,  7.81963878,  5.4700672 ,  5.9739585 ,
        8.70897724,  8.96487767,  7.99967417,  6.66146823,  6.83476828,
        7.69102276,  8.39992335,  8.16675914,  6.29511636,  6.39364899,
       10.30897724,  8.45040312,  8.19228603,  6.36654906,  8.26193451,
        8.59102276, 11.18658667,  8.97980554,  5.79348751,  6.00367267,
        9.60897724,  8.72976357,  7.91417073,  6.37615127,  6.71721123,
        7.49102276,  8.6868748 ,  7.18480092,  6.0359    ,  5.92277097,
        9.60897724,  8.545837  ,  8.47760298,  5.90062301,  6.21032767,
        7.39102276,  7.81907928,  7.17999982,  5.40106214,  6.25461858,
       10.80897724, 11.00889351,  7.81186342,  5.67230835,  5.96003158,
        8.19102276,  8.73112115,  6.52866357,  5.52706949,  4.4671902 ,
        8.10897724,  8.19316585,  6.86320746,  6.08603321,  6.16608132,
        6.79102276,  7.77483293,  7.52712017,  5.8504898 ,  6.12882072,
        8.50897724,  8.6267387 ,  6.7990035 ,  6.30474566,  7.22539441,
        8.19102276, 10.96600474,  7.90754278,  5.22765471,  4.40776186,
        8.50897724,  8.12679632,  6.83230576,  5.4844976 ,  5.91578523,
        6.69102276,  8.20840578,  7.03269298,  5.21565194,  5.05562527,
        7.30897724,  7.55289343,  7.27693232,  5.92672361,  5.96003158,
        6.99102276,  6.87038206,  6.68557263,  4.45240618,  5.20354631,
        9.70897724, 10.90586864,  7.84036457,  5.95522475,  6.16543151,
        6.89102276,  8.53201258,  5.92952852,  5.11574518,  4.783206  ,
        7.70897724,  6.74452625,  6.08106276,  5.64620776,  5.95244058,
        6.99102276,  7.40697979,  6.33365116,  4.83522934,  4.65111677,
        8.50897724,  7.28871498,  6.29977521,  5.09927389,  6.87966444,
        6.99102276, 10.09820922,  7.30360661,  5.13735016,  4.28326363,
        8.40897724,  7.5750166 ,  6.05496217,  5.00656879,  4.58470269,
        5.59102276,  6.14985776,  6.17674211,  4.13858807,  4.790797  ,
        6.60897724,  7.14838493,  6.35917805,  5.61290551,  5.48221238,
        6.79102276,  6.94369276,  6.13863875,  4.45480673,  5.46078358,
        8.90897724,  9.97235342,  7.59315097,  4.81395693,  4.91590048,
        6.39102276,  7.53212783,  5.79631951,  4.70922198,  4.02602635,
        7.50897724,  7.39109003,  6.52568932,  5.16827895,  5.08529488,
        6.09102276,  7.06884081,  7.31560938,  4.59041629,  3.87181395,
        7.40897724,  6.81783696,  6.42578256,  4.81875804,  5.69585312,
        6.69102276, 10.16457875,  6.46205907,  4.9708389 ,  3.51849298,
        6.50897724,  7.2665918 ,  5.5437311 ,  4.37173093,  5.07011288,
        5.19102276,  7.60543854,  6.30755057,  4.07198357,  3.76878908,
        7.30897724,  6.88420648,  6.77290291,  5.13017559,  6.2691062 ])

and I'd like to plot two histograms:

x1,y1,_ = plt.hist(data1, alpha=0.4, bins=20)
x2,y2,_ = plt.hist(data2, alpha=0.4, bins=20)

the result is the following:

Two histograms for data1 and data2

How can I calculate the area of intersecting region?

I tried to find similar bins centers (to find zero-crossing), but I wasn't succeed, I also tried to find the minimum between two bins with the same index, but it doesn't work for negative data.

CodePudding user response:

Make sure you use the same range and number of bins in both histograms:

>>> rng = min(data1.min(), data2.min()), max(data1.max(), data2.max())
>>> n1, bins1, _ = plt.hist(data1, alpha=0.4, bins=30, range=rng)
>>> n2, bins2, _ = plt.hist(data2, alpha=0.4, bins=30, range=rng)

enter image description here

Then the intersection can be easily calculated:

>>> intersection = np.minimum(n1, n2)
>>> area = intersection.sum()
>>> area
20.0
>>> plt.bar(bins1[:-1], intersection, width=bins1[1]- bins1[0])

enter image description here

CodePudding user response:

You can use:

bin_edges = np.histogram_bin_edges(np.concatenate([data1, data2]), bins=40)
hist1, _ = np.histogram(data1, bins=bin_edges)
hist2, _ = np.histogram(data2, bins=bin_edges)
np.minimum(hist1, hist2).sum()

Output:

20
  • Related