Home > OS >  Finding the most frequent character in a string using binary search
Finding the most frequent character in a string using binary search

Time:02-11

Suppose you are given a string of length n and there are 26 ASCII characters that can constitute the string. Identical characters are grouped together in the string (e.g aaabbbcccddd). Find the most frequent character.

The brute force is to scan the entire string and iteratively count the number of characters which takes O(n) time with O(1) space. Is there a O(log n) solution for this? I was given a hint to binary search to find the partition that contains the most frequent character, but I am not sure how binary search would help me eliminating the half of the string.

CodePudding user response:

As your characters are grouped the problem is reduced to finding longest sequence of consequent letters...

So yes you can use binary search to find the size of each sequence which is O(m.log(n)) where m<=n is the number of sequences (distinct letters). As you stated m<=26 you can consider it as constant leading to O(log(n))

The algo would be something like this:

  1. consider your string is char s[n]; and result int l=0;

  2. for (i=0;i<n;)

  3. binary search j position

    of last character the where s[i]==s[j] where i <= j < n

  4. remember max l=max(l , j 1-i)

  5. i=j 1

  6. continue the loop from #2

However note that any letter can have only single sequence so strings like abaaaaaa will not work.

CodePudding user response:

You can use binary search to find the last occurrence (or rightmost occurrence) of a character in a sub-range of a string to solve this problem.

You use it to hop across the input string, from character to character, no matter how often the same character is being repeated. While you hop, you remember the most frequent character so far and how often it was repeated.

middle( left : Integer, right : integer) : (mid : Integer, valid : Boolean)

  1. let mid = left floor(right - left, 2)
  2. return (mid, left != mid)

find-rightmost(s : String, c : Char, left : Integer, right : Integer) : (lastIndex : Integer, found : Boolean)

  1. if left >= right then return (left, false) ;; empty range case
  2. if s[right] = c then return (right, true) ;; spans whole range case
  3. let (mid, valid) = middle(left, right) ;; mid is invalid if left = middle
  4. if valid AND s[mid] = c then find-rightmost(s, c, mid, right)
    else if valid then find-rightmost(s, c, left, mid)
    else return (left, s[left] = c)

find-rightmost is classic binary search in recursive form.

most-frequent-char(s : String) : (c : Char, frequency : Integer)?

  1. let n = length(s)
  2. let nm1 = n - 1
  3. if n = 0 then return nil ;; or NULL of nullptr or whatever you prefer
    else if n = 1 then return (s[0], 1) ;; trivial string of length 1
    else
    let mutable i = 0
    let mutable maxf = 0
    let mutable cmax = '-'
    while i < nm1
    let (ilast, found) = find-rightmost(s, s[i], i 1, nm1)
    ~~~ update max, cmax when suitable
    if found then
    i := ilast 1
    else
    i := i 1
    end while
    return (cmax,maxf)

This yields a total complexity of about O(m*log(n)) where m is the number of disjoint characters.

Since pseudo code cannot easily be verified to actually "work", here the same in Common Lisp:

(defparameter *test-cases*
  '(""
    "a"
    "ab"
    "aabcd"
    "aaaaaaaaaa"
    "abbbbbbbcccdd"))

(defun middle (left right)
  (let ((mid (  left (floor (- right left) 2))))
    (values mid (/= mid left))))

(defun find-rightmost (s c left right)
  (if (< left right)
      (multiple-value-bind (mid valid) (middle left right)
    (cond
      ((char-equal c (aref s right))
       (values right t))
      ((and valid (char-equal c (aref s mid)))
       (find-rightmost s c mid right))
      (valid
       (find-rightmost s c left mid))
      (t (values left (char-equal c (aref s left))))))
      (values left nil)))

(defun most-frequent-char (s)
  (let* ((n (length s))
     (n-1 (- n 1)))
    (cond
      ((= n 0)
       nil)
      ((= n 1)
       (list :char (aref s 0) :freq 1))
      (t ;; string length > 1 - we actually have to work...
       (loop
     with i = 0
     with cmaxf = nil
     with nmax = 0
     while (< i n-1)
     for c = (aref s i)
     do (multiple-value-bind (ilast found)
        (find-rightmost s c (  i 1) n-1)
          (if found
          (let ((freq (  1 (- ilast i))))
            (when (> freq nmax)
              (setf nmax freq)
              (setf cmaxf c))
            (setf i (  ilast 1)))
          (progn
            (when (= 0 nmax)
              (setf nmax 1)
              (setf cmaxf c))
            (incf i))))
     finally (return (list :char cmaxf :freq nmax)))))))
          
(defun run-tests ()
  (mapcar #'(lambda (in)
          (list in (most-frequent-char in)))
      *test-cases*))

CL-USER> (format t "~{~S~%~}~%" (run-tests))
("" NIL)
("a" (:CHAR #\a :FREQ 1))
("ab" (:CHAR #\a :FREQ 1))
("aabcd" (:CHAR #\a :FREQ 2))
("aaaaaaaaaa" (:CHAR #\a :FREQ 10))
("abbbbbbbcccdd" (:CHAR #\b :FREQ 7))

  • Related