I have a 2d NumPy array that looks like this:
array([[1, 1],
[1, 2],
[2, 1],
[2, 2],
[3, 1],
[5, 1],
[5, 2]])
and I want to group it and have an output that looks something like this:
Col1 Col2
group 1: 1-2, 1-2
group 2: 3-3, 1-1
group 3: 5-5, 1-2
I want to group the columns based on if they are consecutive.
So, for a unique value In column 1, group data in the second column if they are consecutive between rows. Now for a unique grouping of column 2, group column 1 if it is consecutive between rows.
The result can be thought of as corner points of a grid. In the above example, group 1 is a square grid, group 2 is a a point, and group 3 is a flat line.
My system won't allow me to use pandas so I cannot use group_by in that library but I can use other standard libraries.
Any help is appreciated. Thank you
CodePudding user response:
Here you go ...
Steps are:
- Get a list
xUnique
of unique column 1 values with sort order preserved. - Build a list
xRanges
of items of the form[col1_value, [col2_min, col2_max]]
holding the column 2 ranges for each column 1 value. - Build a list
xGroups
of items of the form[[col1_min, col1_max], [col2_min, col2_max]]
where the[col1_min, col1_max]
part is created by merging thecol1_value
part of consecutive items inxRanges
if they differ by 1 and have identical[col2_min, col2_max]
value ranges for column 2. - Turn the ranges in each item of
xGroups
into strings and print with the required row and column headings. - Also package and print as a
numpy.array
to match the form of the input.
import numpy as np
data = np.array([
[1, 1],
[1, 2],
[2, 1],
[2, 2],
[3, 1],
[5, 1],
[5, 2]])
xUnique = list({pair[0] for pair in data})
xRanges = list(zip(xUnique, [[0, 0] for _ in range(len(xUnique))]))
rows, cols = data.shape
iRange = -1
for i in range(rows):
if i == 0 or data[i, 0] > data[i - 1, 0]:
iRange = 1
xRanges[iRange][1][0] = data[i, 1]
xRanges[iRange][1][1] = data[i, 1]
xGroups = []
for i in range(len(xRanges)):
if i and xRanges[i][0] - xRanges[i - 1][0] == 1 and xRanges[i][1] == xRanges[i - 1][1]:
xGroups[-1][0][1] = xRanges[i][0]
else:
xGroups = [[[xRanges[i][0], xRanges[i][0]], xRanges[i][1]]]
xGroupStrs = [ [f'{a}-{b}' for a, b in row] for row in xGroups]
groupArray = np.array(xGroupStrs)
print(groupArray)
print()
print(f'{"":<10}{"Col1":<8}{"Col2":<8}')
[print(f'{"group " str(i) ":":<10}{col1:<8}{col2:<8}') for i, (col1, col2) in enumerate(xGroupStrs)]
Output:
[['1-2' '1-2']
['3-3' '1-1']
['5-5' '1-2']]
Col1 Col2
group 0: 1-2 1-2
group 1: 3-3 1-1
group 2: 5-5 1-2