I have a grid of nodes (represented by ones). I would like to quickly and simply (in a way that is both readable and fast) count the number of nodes that share a column or row with another node.
Here is my solution (can it be improved?):
grid=[[0,0,0,0],[1,1,1,1],[0,0,0,1],[0,0,1,1],[0,0,0,1]]
rowlen=len(grid)
collen=len(grid[0])
rd={i: 0 for i in range(rowlen)}
cd={i: 0 for i in range(collen)}
cl=[]
rl=[]
for (rowi, row) in enumerate(grid):
for (coli,x) in enumerate(row):
if x>0:
cd[coli]=cd[coli] 1
rd[rowi]=rd[rowi] 1
cl.append(coli)
rl.append(rowi)
coords=zip(rl,cl)
coordslist=list(coords)
bools=[int(bool(((rd[a]-1) or (cd[b]-1)))) for (a,b) in coordslist]
answer=sum(bools)
CodePudding user response:
Sometimes fast and readable are the same, but more often they depend on your input and wishes. For example, what you are doing is still considered fast for most people, less than a second for a grid of 2500x2500
or 6.250.000
entries.
Now for your question it is noticable, that if you know the row and column counts of 1's
, you can quickly look up if there are more than one 1
in a row or column.
Eventhough fast
and readable
are personally, I will share with you my solutions to the problem.
Pure python
This solution uses only pure python that is available in the library.
import itertools
rows = [row.count(1) for row in grid]
cols = [col.count(1) for col in zip(*grid)]
width, height = len(grid), len(grid[0])
indices = itertools.product(range(width), range(height))
answer = sum(1 for irow, icol in indices if grid[irow][icol] == 1 and (rows[irow] > 1 or cols[icol] > 1))
For this I am using several small tricks
or perks
of the python language, which are the following:
row.count(1)
, which counts the number of times the element1
occurs in a list.zip(*grid)
, which is a reverse zipping, see this post fromMike Corcoran
for more information.itertools.product
, which generates all indices for me (documentation)sum(1 for each in val if <condition on each>)
, which is a generator expression.
For me the above solution, is considered very readable and concise. Now on the speed front, the solution is a bit faster than yours, but how much you might wonder? Well if we go to a grid of 5000x5000
or 25.000.000
elements, the time difference is about 4 seconds (from 12s to 8s).
Well that is kinda bad... All that trouble for 4 seconds on something that would be considered enormous for most programmers, and probably only be used once. And if you have that many elements, you will probably not use a list anymore, but a numpy
array (docs).
Pure Numpy
So how would the solution look like if it was a numpy
array?
import numpy as np
# Convert grid from `list` to a `np.ndarray`
arr = np.array(grid)
# Get to row and column totals
cols = arr.sum(axis=0)
rows = arr.sum(axis=1)
# Get all indices that are `1`, and combine the x and y coordinates by zip.
indices = zip(*np.where(arr == 1))
answer = sum(1 for irow, icol in indices if rows[irow] > 1 or cols[icol] > 1)
The above is even more readable than before, you just need to know a few numpy convetions and methods. Namely, that numpy
uses row, col
for indexing, instead of the traditional (col, row)
or x, y
indexing. Therefore axis=0
, loops over the rows, which give you the column totals. And that np.where
returns the indices of the where the condition inside the brackets is True
.
Now the solution is slightly slower than the pure python solution, because it has to convert the list
to an array
, but if your data was already a numpy array, your solution and the pure python solution would take almost twice the amount of time compared to the numpy solution.
Conclusion
Above are two different alternative solutions that would solve the same problem, but the readability and speed are depending on your knowledge of the language and your needs.
If the grid is not going to be much langer than a 1000 elements, all solutions are equally quick, namely instantaneously from a user perspective.
And the readability is as good as your skills are.