Home > Net >  Cosine Similarity in linear time using Lisp
Cosine Similarity in linear time using Lisp

Time:12-09

The cosine similarity of two lists can be calculated in linear time using a for-loop. I'm curious as to how one would achieve this using a Lisp-like language. Below is an example of my code in Python and Hy (Hylang).

Python:

def cos_sim(A,B):
    import math as _math
    n,da,db,d = 0,0,0,0

    for a,b in zip(A,B):
        n  = a*b
        da  = a*a
        db  = b*b

    da = _math.sqrt(da)
    db = _math.sqrt(db)
    d = da*db

    return n / (d   1e-32)

Hy (Lisp):

(import math)

(defn l2norm [a]
    (math.sqrt (reduce   (map (fn [s](* s s)) a))))

(defn dot [a b]
   (reduce   (map * a b)))

(defn cossim [a b]
   (/ (dot a b) (* (l2norm a) (l2norm b))))

CodePudding user response:

A simple option is to translate the Python version literally to Hy, like this:

(defn cos_sim [A B]
  (import math :as _math)
  (setv [n da db d] [0 0 0 0])

  (for [[a b] (zip A B)]
    ( = n (* a b))
    ( = da (* a a))
    ( = db (* b b)))

  (setv
    da (_math.sqrt da)
    db (_math.sqrt db)
    d (* da db))

  (/ n (  d 1e-32)))

CodePudding user response:

"I'm curious as to how one would achieve this using a Lisp-like language." This really depends on which Lisp you are using. In Scheme you might do something similar to the posted Hy solution:

(define (cos-sim-1 u v)
  (/ (dot-prod u v)
     (* (norm u) (norm v))))

(define (dot-prod u v)
  (fold-left   0 (map * u v)))

(define (norm u)
  (sqrt (fold-left (lambda (acc x) (  acc (* x x)))
                   0
                   u)))

This is linear in time complexity, but it could be improved by a constant factor by passing over the input only once. Scheme provides a named let construct that can be used to bind a name to a procedure; this is convenient here as it provides a simple mechanism for building the dot product and norms:

(define (cos-sim-2 u v)
  (let iter ((u u)
             (v v)
             (dot-product 0)
             (U^2 0)
             (V^2 0))
    (if (null? u)
        (/ dot-product (sqrt (* U^2 V^2)))
        (let ((x (car u))
              (y (car v)))
          (iter (cdr u)
                (cdr v)
                (  dot-product (* x y))
                (  U^2 (* x x))
                (  V^2 (* y y)))))))

Both of these procedures assume that the input lists have the same length; it might be useful to add some validation code that checks this. Note that fold-left is standard in R6RS Scheme, but other standards rely on SRFIs for this, and some implementations may use different names, but the fold-left functionality is commonly available (perhaps as foldl or reduce).

It is possible to solve the problem in Common Lisp using either of the basic methods shown above, though in Common Lisp you would use labels instead of named let. But it would be typical to see a Common Lisp solution using the loop macro. The Common Lisp standard does not guarantee tail call elimination (though some implementations do support that), so explicit loops are seen much more often than in Scheme. The loop macro is pretty powerful, and one way that you could solve this problem while passing over the input lists only once is this:

(defun cos-sim (u v)
  (loop :for x :in u
        :for y :in v
        :sum (* x y) :into dot-product
        :sum (* x x) :into u2
        :sum (* y y) :into y2
        :finally (return (/ dot-product (sqrt (* u2 y2))))))

Here are some sample interactions:

Scheme (Chez Scheme):

> (cos-sim-1 '(1 0 0) '(1 0 0))
1
> (cos-sim-1 '(1 0 0) '(-1 0 0))
-1
> (cos-sim-1 '(1 0 0) '(0 1 0))
0
> (cos-sim-1 '(1 1 0) '(0 1 0))
0.7071067811865475

> (cos-sim-2 '(1 0 0) '(1 0 0))
1
> (cos-sim-2 '(1 0 0) '(-1 0 0))
-1
> (cos-sim-2 '(1 0 0) '(0 1 0))
0
> (cos-sim-2 '(1 1 0) '(0 1 0))
0.7071067811865475

Common Lisp:

CL-USER> (cos-sim '(1 0 0) '(1 0 0))
1.0
CL-USER> (cos-sim '(1 0 0) '(-1 0 0))
-1.0
CL-USER> (cos-sim '(1 0 0) '(0 1 0))
0.0
CL-USER> (cos-sim '(1 1 0) '(0 1 0))
0.70710677

CodePudding user response:

I think your proposed solution is fairly 'lispy': build several short, easy to read functions that combine into your solution. EG:

(defun n (A B)
    (sqrt (reduce #'  (map 'list #'* A B))))

(defun da (A)
    (sqrt (reduce #'  (map 'list #'* A A))))

(defun db (B)
    (sqrt (reduce #'  (map 'list #'* B B))))

(defun cos-sim (A B)
    (let ((n (n A B))
          (da (da A))
          (db (db B)))
      (/ (* n n) (  (* da db) 1e-32)))

But, notice that n, da, and db look very similar. We can see if we can make those a single function, or macro. In this case, a function with an optional second list parameter is easy enough. (And note that I've defined n in a slightly weird way to emphasize this, but we might prefer not to take a square root and then square it for our final calculation. This would be easy to change by checking for passing the optional parameter (included as B-p below); I chose to move the square root inside the combined function) Anyway, this gives us:

 (defun d (A &optional (B A B-p))
    (reduce #'  (map 'list #'* A B)))
 
 (defun cos-sim (A B)
    (let ((n (d A B))
          (da (sqrt (d A)))
          (db (sqrt (d B))))
      (/ n (  (* da db) 1e-32))))

Alternately, using Loop is very Common Lisp-y, and is more directly similar to the python:

(defun cos-sim (A B)
    (loop for a in A
          for b in B
          sum (* a b) into n
          sum (* a a) into da
          sum (* b b) into db
          finally (return (/ n (  (sqrt (* da db)) 1e-32)))))

CodePudding user response:

Here is a fairly natural (I think) approach in Racket. Essentially this is a process of folding a pair of sequences of numbers, so that's what we do. Note that this uses no explicit assignment, and also pulls the square root up a level (sqrt(a) * sqrt(b) = sqrt(a*b) as taking roots is likely expensive (this probably does not matter in practice). It also doesn't do the weird adding of a tiny float, which I presume was an attempt to coerce a value which might not be a float to a float? If so that's the wrong way to do that, and it's also not needed in a language like Racket (and most Lisps) which strive to do arithmetic correctly where possible.

(define (cos-sim a b)
  ;; a and b are sequences of numbers
  (let-values ([(a^2-sum b^2-sum ab-sum)
                (for/fold ([a^2-running 0]
                           [b^2-running 0]
                           [ab-running 0])
                          ([ai a] [bi b])
                  (values (  (* ai ai) a^2-running)
                          (  (* bi bi) b^2-running)
                          (  (* ai bi) ab-running)))])
    (/ ab-sum (sqrt (* a^2-sum b^2-sum)))))

You can relatively easily turn this into typed Racket:

(define (cos-sim (a : (Sequenceof Number))
                 (b : (Sequenceof Number)))
  : Number
  (let-values ([(a^2-sum b^2-sum ab-sum)
                (for/fold ([a^2-running : Number 0]
                           [b^2-running : Number 0]
                           [ab-running : Number 0])
                          ([ai a] [bi b])
                  (values (  (* ai ai) a^2-running)
                          (  (* bi bi) b^2-running)
                          (  (* ai bi) ab-running)))])
    (/ ab-sum (sqrt (* a^2-sum b^2-sum)))))

This probably is no faster, but it is fussier.

This might be faster though:

(define (cos-sim/flonum (a : (Sequenceof Flonum))
                        (b : (Sequenceof Flonum)))
  : Flonum
  (let-values ([(a^2-sum b^2-sum ab-sum)
                (for/fold ([a^2-running : Flonum 0.0]
                           [b^2-running : Flonum 0.0]
                           [ab-running : Flonum 0.0])
                          ([ai a] [bi b])
                  (values (  (* ai ai) a^2-running)
                          (  (* bi bi) b^2-running)
                          (  (* ai bi) ab-running)))])
    (/ ab-sum (assert (sqrt (* a^2-sum b^2-sum)) flonum?))))

I have not checked it is however.

  • Related