Home > OS >  Single histogram from multiple columns in data frame
Single histogram from multiple columns in data frame

Time:07-29

Right now I have a data-frame which stores multiple y-values measured over a range of wavelengths (x-values), like below:

              x        y0        y1  ...       y12       y13       y14
659  335.072723 -0.003132 -0.002903  ...  0.002981  0.007160  0.003514
660  335.285309 -0.001403 -0.002011  ...  0.003565  0.003635  0.002760
661  335.497925 -0.002544  0.002261  ...  0.007555  0.004373  0.011329
662  335.710510 -0.000546 -0.004526  ...  0.003469  0.004768  0.006191
663  335.923157 -0.011268 -0.001985  ...  0.001623  0.003106  0.006442
..          ...       ...       ...  ...       ...       ...       ...
771  358.970215 -0.000100 -0.000876  ...  0.001332 -0.000311  0.002851
772  359.184387 -0.000897  0.001286  ... -0.000037  0.000300  0.001410
773  359.398560 -0.001188  0.001162  ... -0.000901  0.000200  0.001739
774  359.612762  0.001268 -0.001201  ... -0.000545  0.001514  0.000689
775  359.826965  0.001439 -0.000671  ...  0.000295  0.002545  0.002769

I am able to create a histogram from the first column, using

plt.hist(megDiff['y0'], bins = 100, label = "y0 Data Set")

However, I would like to create a single histogram from all of the y-value data sets. I am not quite sure how to do this, if anyone knows I would greatly appreciate some help

CodePudding user response:

You can combine (concat) all the y* columns into one then do the plot.

import pandas as pd
import matplotlib.pyplot as plt
import itertools as it

df = pd.DataFrame({'x': [1, 2, 3],
                   'y1': [11, 23, 31],
                   'y2': [12, 22, 32],
                   'y3': [13, 21, 33],})

plt.hist(sorted(it.chain(*df.loc[:, df.columns != 'A'].values)), bins = 100, label = "y0 Data Set")

Sample result:

enter image description here

CodePudding user response:

If you want a different label for each column, you can directly pass all the columns that interest you to the hist function.

Here is an example which plots every column except the first (x):

plt.hist(df[df.columns[1:]], bins = 100)
plt.legend(df.columns[1:])
plt.show()

Maybe this link can help you : https://matplotlib.org/stable/gallery/statistics/histogram_multihist.html.

  • Related