Is it possible to implement PRAM CRCW algorithms on consumer processors with the same big O scalabil-CodePudding

I've come to learn that there are linear time sorting algorithms that don't run by comparisons like radix sort. My hope is to have a sorting algorithm that runs in linear time but can also run in constant time by running n threads for n elements. From the research I've done, this seems possible on a PRAM CRCW machine but I've found conflicting information as to whether the algorithm that runs on a PRAM CRCW machine can be run on a standard consumer computer in the same constant time.

FYI, the algorithm in question is here. This is pretty interesting as well.

Is it possible?

CodePudding user response：

Q : "Is it possible _{( to implement PRAM CRCW on consumer processor )} ?"

A :
Let's clarify the facts first. We can agree on what "consumer"-processors are - the most often a COTS term is right this - a Custom-Over-The-Shelf processor, as anyone can go and buy. So is the set of properties of any such COTS hardware, being pre-defined by the silicon structures pre-fabricated "inside" such processor.

On the contrary, the CRCW PRAM term is knowingly & intentionally a highly abstract, ultimately idealised property of any such processor architecture, that can (without having any limits in time or other compromises) Concurrently Read (under any and all levels of parallelism) and also Concurrently Write (under any and all levels of parallelism) from/into any memory location ("address") all at once, adding some additional créme-a-la-créme property, like to performe a sum of all CW-s, before actually storing a such resulting value. Any such physical implementation of these abstract properties, that meets them all under any circumstances, having no exceptions to doing so in full parallel-mode, can be called a CRCW PRAM and never otherwise.

This said, the CRCW PRAM architecture is by far not met, not even being anywhere close to it, in any of the current COTS processor hardware silicon.

Such question is leading, by definition, to actually unachievable wish to have an architecture-A get "implemented" by using an architecture-B (which can never be turned into meeting an architecture-A, even if composing many such COTS processors (as defined) into some interconnected macro-structure, which may turn some of the COTS hardware properties a bit "closer" to the CRCW PRAM, yet at such devastatingly adverse costs or slowness of operations, that such attempts can result but in something ultra-expensive ultra-power-inefficient ultra-slow ( being about N^{2 ~ 3} sub-sampled and having a need to artificially "wait" for all the slowest parts for a full-width of the parallelism to get physically completed, if viewed from the macro-structure point of view).

Using any amount of superscalar, M-way pipelined, out of order executing CISC silicon for achieving a macro-structure topological trick just for simulating a "slowed down" CRCW PRAM is IMHO technically not a right way to go ( if we want to enjoy a reasonably practical O( k )-sorting machine ).

If using a current level of QPU processors, we may "somehow" enjoy a constant time QUBO (a single hardware-instruction quantum processor in the current line of the D-WAVE systems' machines ), I would hesitate to consider this corner-case (topologically setup to bear "inital" state and letting The Nature ( the laws of physics ) to "execute" a quantum-annealing "algorithm" to result in a statistical-distribution of results, answering the problem solution in constant time ) a COTS, which it is not, is it?