A Dual-Port Content Addressable Memory
Design Review #1: Design Proposal


André Mathieu, Brian Magnuson, Andrew Wolan

1. Function:

The attached diagram illustrates the main flow of data in the project with some internal control lines omitted. The I/O of the chip will be provided using a pair of unidirectional 32-bit lines that are presumably latched off chip so that no internal registers are required to maintain the data internally. If that is not the case, we may need to add a pair of 32-bit registers to handle the I/O to our block.


2. Justification:

There are two main reasons for implementing a CAM (Content Addressable Memory) as a VLSI design. First and foremost, hardware is superior to software in this application in that every location in memory can be searched in parallel in constant time, while software can at best do the same operation in logarithmic time. Furthermore, implementing a CAM in VLSI as opposed to some other type of hardware such as an FPGA is desirable due to the rather large amount of interconnect required which is in short supply in a FPGA and would make it an uneconomical solution.


3. Background:

Due to the fact that we intend to operate our CAM in a manner that is not standard we do not have much in the way of previous art to refer to. There was a more traditional implimentation of a CAM designed by one of the groups in last year's class that we had hoped to be able to use as a reference, but their design was removed from the web. There are other references to CAM designs which we found on the web which we did not find useful because they were all quite different in their implementation from our own proposed design.


4. Specifications

The only specifications that we have received to this point is the size of the data that we will be dealing with and the system clock rate. We will be dealing with 32 bits of data and a 50 MHz clock. We were told to make the system as fast as possible. Our goal is to perform reads and writes in one clock cycle. We plan to accomplish this by using both the rising and falling edges of the system clock to gate events within our system. The critical path will be in the priority encoder. Based on the priority scheme that we select, the encoder will have to evaluate and determine which tag to present to the row decoder in less than one half of a clock cycle or approximately 10 ns. We feel that this is a realistic goal, but we may have to implement some sort of pipeline if it can not be reached.


5. Technology, Die Size, Package:

We estimate that we will use approximately 245,000 transistors in our design. (See breakdown in the following section. We had hoped to use .5u technology in order to help with our speed requirements. Based on our last communication from Professor Burleson, we may have to use 1.2u technology. This could effect our design and require us to pipeline our inputs or have us use two clock cycles to process data. Based on designs from last semester, we estimate that the average transistor in our system will be approximately 12 x 2 lamda. This calculates into 1.2E6 square lamda for the transistors alone! Estimating another 50% for interconnects, our design approaches 1.8E6 square lamda!

There are two paths that we will have to concentrate most of our attention on. First, during the high phase of the clock the match lines must evaluate and be fed into the priority encoder. This value must then pass through a multiplexer and settle at the register in time for the falling edge of the clock. During the low phase, the data from this register is passed to the column decoder which selects a word to be sensed and passed to the outputs before the next rising edge.

Our design will use the following pins:


6. Design Style

We estimate the following transistor counts for each component:

  • CAM
          1024 x 7 Tag cells @ 18 transistors / cell = 128,304
          1024 x 8 Data cells @ 12 transistors / cell = 98,304
          2 : 1024 Isolation transistors = 2,048
    Memory total = 228,656

  • Data Path
          2 : 10 bit 4 to 1 Multiplexer = 280
          2 : 10 bit 3 to 1 Multiplexer = 240
          2 : 1024 to 10 Priority Encoder = ~3000
          2 : 10 to 1024 Column Decoder = 12448
          2 : 10 bit register = 160
          24 Tri-State buffers with precharge = 120
    Data Path total = 16,248

  • Address Counter (Random Logic)
          2 x 10 bit counter = 200
    This totals to approximately 245,000 transistors.

    These estimates are conservative and we expect them to be revised downward.

    7. Partitioning

    This design project leds itself very well to partitioning. Brian Magnuson will design and simulate the memory blocks along with their input buffers. Andrew Wolen will design the counter that determines the write addresses as well as the priority encoder. André Mathieu will design the row decoders, multiplexers and the remaining registers. The design reports will be a collaborative effort with each team member responsible for the information related to their part of the project. We will edit the reports as a group in order to maintain a consistent writing style.

    8. Schedule

    The blocks to this design can all be simulated independently. As such, each member of the group can proceed with their portion of the design with April 7 as the target date for combining the blocks and testing the system. The memory module and the priority encoder will require the most amount of simulation using Spice to determine critical paths and speeds of the design.

    9. Test Chip Tapeout and Fabrication

    This design is very scalable. The concepts can be utilized on a design using many fewer CAM cells. This would require smaller versions of each of the component blocks. This new smaller design could be implemented in footprint of the MOSIS 4,000 square lamda 1.2u technology.