I have some high dimensional boolean data, in this example an array with 4 dimensions, but this is arbitrary:
X.shape
(3, 2, 66, 241)
I want to group the dataset into connected regions of True values, which can be done with scipy.ndimage.label, with the aid of a connectivity structure which says which points in the array should be considered to touch. The default 2-D structure is a cross:
[[0,1,0],
[1,1,1],
[0,1,0]]
Which can be easily extended to high dimensions if all those dimensions are connected. However I want to programmatically generate such a structure where I have a list of which dims are connected to which:
#We want to find connections across dims 2 and 3 across each slice of dims 0 and 1:
dim_connections=[[0],[1],[2,3]]
#Now we want two separate connected subspaces in our data:
dim_connections=[[0,1],[2,3]]
For individual cases I can work out with hard-thinking how to generate the correct structuring element, but I am struggling to work out the general rule! For clarity I want something like:
mystructure=construct_arbitrary_structure(ndim, dim_connections)
the_correct_result=scipy.ndimage.label(X,structure=my_structure)
CodePudding user response:
The key to constructing an arbitrary structure for scipy.ndimage.label is to understand the concept of a neighborhood. A neighborhood is a set of points in the data that are considered to be connected. For example, in a 2D array, the neighborhood of a point (x,y) is the set of points {(x-1,y-1), (x-1,y), (x-1,y 1), (x,y-1), (x,y), (x,y 1), (x 1,y-1), (x 1,y), (x 1,y 1)}
.
In order to construct an arbitrary structure for scipy.ndimage.label, we need to define a neighborhood for each point in the data. To do this, we need to define a set of connections between the dimensions of the data. For example, if we have a 4D array, and we want to connect dimensions 0 and 1, and dimensions 2 and 3, then our set of connections would be [[0,1],[2,3]]
.
Once we have defined our set of connections, we can construct our structure tensor. The structure tensor is a 3D array, where the first two dimensions correspond to the dimensions of the data, and the third dimension corresponds to the connections between the dimensions. For example, if we have a 4D array, and we want to connect dimensions 0 and 1, and dimensions 2 and 3, then our structure tensor would be of size (4,4,2).
The structure tensor is constructed by setting the elements of the third dimension to 1 if the corresponding dimensions are connected, and 0 otherwise. For example, if we have a 4D array, and we want to connect dimensions 0 and 1, and dimensions 2 and 3, then our structure tensor would be:
[[[1, 0],
[0, 0],
[0, 1],
[0, 0]],
[[0, 0],
[1, 0],
[0, 1],
[0, 0]],
[[0, 1],
[0, 0],
[1, 0],
[0, 0]],
[[0, 0],
[0, 0],
[0, 1],
[1, 0]]]
Once we have constructed our structure tensor, we can pass it to scipy.ndimage.label to generate the connected regions of our data.
CodePudding user response:
This should work for you
def construct_arbitrary_structure(ndim, dim_connections):
#Create structure array
structure = np.zeros([3] * ndim, dtype=int)
#Fill structure array
for d in dim_connections:
if len(d) > 1:
# Set the connection between multiple dimensions
for i in range(ndim):
# Create a unit vector
u = np.zeros(ndim, dtype=int)
u[i] = 1
# Create a mask by adding the connection between multiple dimensions
M = np.zeros([3] * ndim, dtype=int)
for j in d:
M = np.roll(u, j)
structure = M
else:
# Set the connection for one dimension
u = np.zeros(ndim, dtype=int)
u[d[0]] = 1
structure = u
#Make sure it's symmetric
for i in range(ndim):
structure = np.roll(structure, 1, axis=i)
return structure