Home > database >  What is fundamentally wrong in the choice of data structure in this matrix addition code?
What is fundamentally wrong in the choice of data structure in this matrix addition code?

Time:04-02

Section 2.2.4 here contains the following:

2.2.4 Totally Inappropriate Data Structures

Some might find this example hard to believe. This really occurred in some code I’ve seen:

  (defun make-matrix (n m)
    (let ((matrix ()))
       (dotimes (i n matrix)
          (push (make-list m) matrix))))

 (defun add-matrix (m1 m2)
     (let ((l1 (length m1))
            (l2 (length m2)))
       (let ((matrix (make-matrix l1 l2)))
          (dotimes (i l1 matrix)
            (dotimes (j l2)
              (setf (nth i (nth j matrix))
                     (  (nth i (nth j m1))
                        (nth i (nth j m2)))))))))

What’s worse is that in the particular application, the matrices were all fixed size, and matrix arithmetic would have been just as fast in Lisp as in FORTRAN.

This example is bitterly sad: The code is absolutely beautiful, but it adds matrices slowly. Therefore it is excellent prototype code and lousy production code. You know, you cannot write production code as bad as this in C.

Clearly, the author thinks that something is fundamentally wrong with the data structures used in this code. On a technical level, what has went wrong? I worry that this question might be opinion-based, but the author's phrasing suggests that there is an objectively correct and very obvious answer.

CodePudding user response:

Lisp lists are singly-linked. Random access to an element (via nth) requires traversing all predecessors. The storage is likewise wasteful for this purpose. Working with matrices this way is very inefficient.

Lisp has built-in multidimensional arrays, so a natural fit for this problem would be a two-dimensional array, which can access elements directly instead of traversing links.

CodePudding user response:

There's a strong assumption in numerical code that access to elements of matrices, or more generally arrays, is approximately constant-time. The time taken for a[n, m] should not depend on n and m. That's hopelessly untrue in this case, since, given the obvious definition of matrix-ref:

(defun matrix-ref (matrix n m)
  (nth m (nth n matrix)))

then, since nth takes time proportional to its first argument )more generally: accessing the nth element of a Lisp list takes time proportional to n 1, counting from zero), then the time taken by matrix-ref is proportional to the product of the two indices (or in fact to the product of the two (indices 1).).

This means that, for instance, almost any algorithms involving matrices will move up time complexity classes. That's bad.

  • Related