Home > Software engineering >  1D histogram from 4 column txt dataset in python?
1D histogram from 4 column txt dataset in python?

Time:02-02

I have a text file with 4 columns, the first 3 are the x, y and z coordinates of one datapoint, and the 4th column is the value of the datapoint at that x, y, z set of coordinates.

For example:

0 1 2 10000
0 1 3 20000
0 2 1 30000
1 0 0 40000
1 1 1 50000

I want to make a plot having as the horizontal axis the x-coordinate-value and as the vertical axis the TOTAL value at that location (x-coordinate-value). This basically is a marginalized histogram across y and z of the .txt dataset above.

For example, for the above dataset, I would have 2 points in my plot: (0, 60000) and (1, 90000) where the first number represents the x-coordinate value and the second number represents the value.

I have tried to read about the np.histogramdd function, but when I feed in my .txt dataset, it outputs a 4 dimensional tensor. I then sum across its 2nd and 3rd axes (Matlab notation) to obtain a 2D tensor. This has shape (10, 10).

How could I obtain the (0, 60000) and (1, 90000) from above?

Thank you!

CodePudding user response:

You can ignore the other arrays, and just do a one-dimensional histogram. For instance,

In [14]: a = np.array([[0,1,2,10000], [0,1,3,20000], [0,2,1,30000], [1,0,0,40000], [1,1,1,50000]])

In [15]: print(np.histogram(a[:,0], weights=a[:,3]))
(array([60000,     0,     0,     0,     0,     0,     0,     0,     0,
       90000]), array([0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1. ]))

You can use the usual bins= parameter in np.histogram to set whatever bins you need.

  • Related