Implementing a hardware that divides an 8 bit number by 3 (11) in binary-CodePudding

I'm using Xilinx to create a schematic that divides any 8-bit number by 3! Does anyone have experience regarding this- anything would be helpful! Forexample- hardware takes two inputs (111101) and (11) and returns the division of two numbers which is 010100! I don't need to worry about remainder- just need quotient

CodePudding user response：

At least two approaches are suitable for HW implementation;

reciprocal multiplication

The reciprocal multiplication is computed from (x*K)>>N, where in this case K could be ceil(512/3) = 171 0b10101011 = 9 * 19, N = 9. The factorisation helps, since a number is easily multiplied both by 9 = 0b1001 and 19 = 0b10011. Multiplying by 9 is done by one addition and a free shift, multiplying that by 19 is done by 2 free shifts (wiring) and 2 additions. Total cost == 3 additions.

(non)-restoring division

Just as learned in elementary school, a long division can be easily converted to HW circuit.

00101001  = 41
11        -> trial "subtraction" fails, result bit = 0

00101001
 11       -> trial subtraction fails, result bit = 0

00101001
  11      -> trial subtraction fails = 0

00101001  -> pass = 1 
 (011)
------
   10001  -> pass
    11
   -----
     101  -> fail
     11
   -----
     101  -> pass
      11

Result = b0001101 = 13

One does not need to compute the actual trial subtractions, since the divisor is a constant. There's instead a quick expression Passed = top|(mid & bottom). And likewise one can form logical expressions for the three outputs at each stage.

Starting with top = 0, mid = input:7, bot = input:6, one needs to iterate over 7 positions, computing R=Result,Top,Mid from top, mid, bot.

The logical expressions for each stage are then

  top mid bot    R  T  M  B
 --------------------------
  0   0   0      0  0  0      R = t mb
  0   0   1      0  0  1      T = ~bm tb
  0   1   0      0  1  0      M = ~bt ~t~mb
  0   1   1      1  0  0      B = next bit from input
  1   0   0      1  0  1
  1   0   1      1  1  0
  1   1   0      x  x  x
  1   1   1      x  x  x      x = don't care as impossible

Comparison

The area estimates using typical FPGA cells would be something like 9 7 13 cells using the addition method and maybe 7x3 or 7x4 using long division.

Some FPGAs do have dedicated LUT sections, and in those cases a 256x7 bit LUT is probably a good solution. The minimum gate count would be 7 registers ( memory), but I would expect the memory access to bring in plenty of elements as a driver.

One can also implement 1x256 bit LUT using a hierarchy of 16-bit LUTs. 32-bit LUT = multiplex(bits[4], LUT0(bits[3:0]), LUT1(bits[3:0])); A single 256-entry LUT would require 16 16-bit LUTs and 15 multiplexers per bit, or 1680 LogicElements without advanced building blocks. This assumes that each LE can implement an arbitrary 16-bit LUT (4 inputs, 1 output), as it was customary already in late 90s. A multiplexer fits these contraints, as it's a simply a custom 3-input logic function, where as a 16-bit LUT is a custom 4-input logic function.

  --------------- ----------------- 
 | Method        |    Area         | 
  --------------- ----------------- 
 | 1. reciprocal |     ~30         |
 | 2. division   |   21-28         |
 | 3. 16-bit LUT |    1680         |
 | 4. 256x8-LUT  | 7-100   Memory  | 
  --------------- -----------------

Disclaimer: I would expect some advancement happening on logic synthesizers during the past 20 years -- one can probably generate a good divider using System-C. There might be even a possibility to lay out the logic cells and the connections manually -- with Altera I had no such luck, the logic, the placement and the routing was synthesised automatically from Verilog or VHDL, often with embarrassing results.

CodePudding user response：

A good compromise might be to use a bunch of look up tables. Pseudo code looks like this:

# look up table for the high part
# returns 7 bit quotient, 2 bit remainder
# in case you have 8 bit luts, you can save
# one LUT by replacing the high bit with x[2]&x[3]
lut_hi(x[4]) =
    0000 => 0000000:00
    0001 => 0000101:01
    0010 => 0001010:10
    0011 => 0010000:00
    0100 => 0010101:01
    0101 => 0011010:10
    0110 => 0100000:00
    0111 => 0100101:01
    1000 => 0101010:10
    1001 => 0110000:00
    1010 => 0110101:01
    1011 => 0111010:10
    1100 => 1000000:00
    1101 => 1000101:01
    1110 => 1001010:10
    1111 => 1010000:00

# look up table for the low part
# returns 3 bit quotient, 2 bit remainder
# you can once again save a bit by replacing
# the high bit with x[2]&x[3]
lut_lo(x[4]) =
    0000 => 000:00
    0001 => 000:01
    0010 => 000:10
    0011 => 001:00
    0100 => 001:01
    0101 => 001:10
    0110 => 010:00
    0111 => 010:01
    1000 => 010:10
    1001 => 011:00
    1010 => 011:01
    1011 => 011:10
    1100 => 100:00
    1101 => 100:01
    1110 => 100:10
    1111 => 101:00

# combine two remainders into 
remainders(a[2], b[2]) =
    return a[1]&b[1] | (a[0]|b[0])&(a[1]|b[1])

# divide by 3: look up the two halves, sum them, apply carry
div3(x[8]) =
    q_lo:r_lo = lut_lo(x[0:3])
    q_hi:r_hi = lut_hi(x[7:4])
    return q_lo   q_hi   remainders(r_lo, r_hi)

This code works by dividing the number into two nibbles, dividing each nibble by 3 and then summing the quotients.