I have a text file with 4 columns, the first 3 are the x, y and z coordinates of one datapoint, and the 4th column is the value of the datapoint at that x, y, z set of coordinates.
For example:
0 1 2 10000
0 1 3 20000
0 2 1 30000
1 0 0 40000
1 1 1 50000
I want to make a plot having as the horizontal axis the x-coordinate-value and as the vertical axis the TOTAL value at that location (x-coordinate-value). This basically is a marginalized histogram across y and z of the .txt dataset above.
For example, for the above dataset, I would have 2 points in my plot:
(0, 60000)
and (1, 90000)
where the first number represents the x-coordinate value and the second number represents the value.
I have tried to read about the np.histogramdd
function, but when I feed in my .txt dataset, it outputs a 4 dimensional tensor. I then sum across its 2nd and 3rd axes (Matlab notation) to obtain a 2D tensor. This has shape (10, 10)
.
How could I obtain the (0, 60000)
and (1, 90000)
from above?
Thank you!
CodePudding user response:
You can ignore the other arrays, and just do a one-dimensional histogram. For instance,
In [14]: a = np.array([[0,1,2,10000], [0,1,3,20000], [0,2,1,30000], [1,0,0,40000], [1,1,1,50000]])
In [15]: print(np.histogram(a[:,0], weights=a[:,3]))
(array([60000, 0, 0, 0, 0, 0, 0, 0, 0,
90000]), array([0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1. ]))
You can use the usual bins=
parameter in np.histogram
to set whatever bins you need.