Hi how's it going? I have a 1D array like this:
locations = [1,2,4,4,3,2,2,4,1,4]
Where each entry represents one visit to that particular location. In this case there are 4 locations - 1,2,3 and 4. I want to transform this to a 2x2 numpy array the top left box has the number of visits to location 1, top right to location 2, bottom left to location 3, and bottom right to location 4.
Something like this:
np.array([[2,3],[1,4]])
How would I go about doing this in a way that would scale well when given a much larger location array? Thanks so much and have a lovely day.
CodePudding user response:
what you want to do is count the unique values. Here is an example:
import numpy as np
locations = [1,2,4,4,3,2,2,4,1,4]
result = np.array(np.unique(locations, return_counts=True))
print(result)
The print out would be
array([[1, 2, 3, 4],
[2, 3, 1, 4]])
CodePudding user response:
I think there are two parts to your question:
- You want to count the incidence of each number that appears in your original array.
- You want to format the resultant array (which contains the incidence of each number) as a square matrix. It wasn't clear to me whether your more general case will always only consist of the numbers 1-4 (and always result in a 2x2 matrix) or if you want to consider more general cases with e.g., values 1-16 that results in a 4x4 matrix. I'm assuming the former for now.
Pieter's answer has a good way to count the incidences (step 1), but I've modified it to separate the unique values vs. the counts:
import numpy as np
locations = [1,2,4,4,3,2,2,4,1,4]
values,counts = np.array(np.unique(locations, return_counts=True))
Pieter's answer does not return a 2x2 array (step 2), but you can get that by reshaping the counts array here:
counts2x2=counts.reshape([2,2])
print(counts2x2)
the output of which will be:
[[2 3]
[1 4]]
There is one caveat here, in that this method will not work if your original array does not have the values 1-4 all represented in it! i.e., this method will not work if your original array only has 1, 2, & 4 in it--it will simply return a length 3 array of counts rather than a length 4 array with a zero in the 3 spot. If you have a very large input array, that is probably unlikely to be an issue, but if you want to be safer I would recommend using np.histogram which can count up the number of objects in different bins much more generally:
import numpy as np
locations = [1,2,4,4,3,2,2,4,1,4]
locations_no3 = [1,2,4,4,2,2,2,4,1,4]
hist,edges=np.histogram(locations,bins=np.arange(0.5,5))
hist_no3,edges_no3=np.histogram(locations_no3,bins=np.arange(0.5,5))
print("With 3:")
print(hist.reshape(2,2))
print("Without 3:")
print(hist_no3.reshape(2,2))
The output is shown below. Note that this solutions works whether or not all the numbers 1-4 are represented in the original array.
With 3:
[[2 3]
[1 4]]
Without 3:
[[2 4]
[0 4]]