We will look at the steps required to design a DSM VLSI circuit and will use information describing these steps to understand the benefits of a flow in which these steps do not have to be taken when following the path of IP reuse.
Specification
At the specification level, the description what one "wants" may have
one of many possible formats depending on needs and preferences.
Although, the specification could start out with a human-language
description, it will generally be translated immediately into a more
precise formulation.
Coding Into High-Level Behavioral Language
Once the specification is clear and understood by everybody, the
"intent" needs to be translated into a format that is technically
doable. It needs to be coded. The desired description could be in a
high-level behavioral language (HDL), a flow diagram or a state
diagram, etc. If it is in Verilog or VHDL, cycle-to-cycle clock
activity will not be defined at this point because it is strictly
behavioral and contains no timing information. It also does have to be
a synthesizable subset of Verilog or VHDL but it should adhere strictly
to the IEEE standard.
If the specification is in HDL, a block diagram or a flow diagram,
the behavioral code can be machine-generated to VHDL or Verilog.
Because there is no link to any particular physical version, we still
have the freedom to generate various functionally equivalent
architectures. This is very efficient and extremely useful. However,
machine-generated code does not contain commenting lines and the
resulting code will be difficult to interpret. Since these days reuse
comes in the "flavor of the month", this may be a serious drawback
because it can make reuse, especially of archived code, difficult once
the original designers have left the company. The alternative to
machine-generated code is to generate the code "by hand". In the
interest of reuse, guidelines such as those suggested in the RMM [1]
should be followed.
Now, we may want to proceed with synthesis. If this is the case, the
language in which the desired design is coded must be described in
synthesizable subset of behavior-level VHDL or Verilog. However, there
is still no "attachment" to an actual physical implementation. It is a
behavioral description.
Functional Verification
This highest functional level description of the desired project is the
direct result of a specification that may have been done in a human
language such as English. Thus, translating this intent into a more
technical, more mathematical format may add ambiguities. However, no
matter how the functional intent was specified, it needs to be verified
- and the earlier, the better. A functional simulation is needed to
make sure the ultimate product eventually does what it needs to do, at
least functionally. To verify this functional-level specification, we
need a functional simulator. There are several on the market. So far,
the designs do not include any timing information. As a result, any
verification of the timing will have to be done later, although the
desired speed has already been projected in the specification.
Synthesis
This initial phase of synthesis (in a new design) has to do without
very much timing data. After checking the functionality at the behavior
level, logic and test synthesis will be performed. As we pointed out,
the coding must be a synthesizable subset of Verilog or VHDL. If the
data describing the design is behavior-level VHDL or Verilog, it could
be translated into an RTL code. The resulting RTL code is a preparation
for synthesis. RTL code specifies clock-to-clock cycle operations. It
is also becomes linked to an architecture as opposed to behavior-level
synthesis. Thus, architectural trade-offs have to happen before this
step.
In essence, synthesis is the same as going to the "parts department"
(called the technology library in VLSI design), to find the physical
building blocks with which to put our system together. The range of
parts will generally run from large macros to standard cells to gates.
The granularity is, of course, much finer for silicon compilers.
Since we are now at an implementation level, such as RTL code
synthesized directly into gates or other building blocks, timing starts
to play a role. We need to select those components from a technology
library that have the proper timing characteristics for the entire
chip. Since layout is so critical to DSM technologies, we need some
estimates of the timing of interconnects. Since there is no physical
layout, the only data available at this point for timing is generally
referred to as "statistical wire-load model". This model is an attempt
to specify some timing before any floorplanning or placement. Such
statistical models have no relationship to the design under
consideration. They are based on past designs. There are few of them
and the technology is constantly and rapidly changing. This is like
predicting stock prices based on past performance.
A better approach is often referred lo as "custom wire models". With
these models, interconnect timing estimates are based on projected
physical placements of the building blocks in the present chip, the
chips that are actually being designed. No routing has been done, no
extractions have been done. These models are beller lhan statistical
wire-load models, but timing convergence is still highly unlikely.
Since the routing of the interconnects has such a dramatic effect on
timing, their accuracy is still seriously questionable.
Verification Between Levels of Abstraction
The design flow shown in Fig. 7.2 starts at a high level of
abstraction, a behavioral or functional level, and proceeds towards a
lower level of abstraction, eventually the physical layout. The
translation between levels requires verification to make sure the
initial intent is nol lost in the process. This might best be done with
formal verification between levels where such a test is an equivalency
test of logic functions.
Thus, the following steps are required at some levels and between levels of abstraction:
As pointed out in 2, there needs to be verifications at some levels in the flow besides the verification of the "translation" between all the levels of abstraction. The highest level of verification is to check if the system designed does what we want it to do. This will be done first at the functional level.
Floorplanning and Placement
We are now in the early stages of the physical layout of a chip. Fig.
7.2 suggests that floorplanning, placement and routing are separate
tasks. Ideally, these tasks should be done together, interactively.
This is not done in practice because each of these tasks is already extremely computer-intensive by itself. This is especially true for routing (discussed later). However, we will see in the discussion here that it is conceptually difficult to separate them because the end result depends so much on their working well together.
With floorplanning, one tries to get an idea early on of how major blocks are going to fit together, how the shape and aspect ratios of the various blocks will affect putting together the puzzle. A critical question is with what ease the blocks will interconnect. Are the connections of intercommunicating contacts close to each other or not? Many blocks might want to use feed-throughs to ease the task.
Feed-throughs are much more important for DSM VLSI chips than for earlier processes.
If the floorplanning is done with manual interaction, optical aids such as a rat's nest are used to get an indication of congestions and the path of major interconnects.
The placement actually places the various building blocks and determines, therefore, the dimensions such as the space available for the router to place the interconnects. The quality of a floorplan in conjunction with the spaces reserved in the placement for the router can make the difference between a good or bad route or a route that does not even complete. It also has a big effect on timing in DSM technology chips. After floorplanning, the relative positions of these blocks are set and we have a general idea about the lengths the interconnects will have.
Refined Synthesis
After floorplanning and placement, net loads are still estimated based
on the placement and orientation of the blocks. Using these data, a
more refined synthesis can now be performed. Net loads are back-
annotated and a more informed selection of cells can be chosen with
synthesis. Net and cell delays may be specified in a format such as SDF
(Standard Delay Format), net resistances, physical cluster and location
information via PDEF (Physical Data Exchange Format).
However, at this point, it is still only the active parts of the circuit that have accurate delays. The net delays are still an estimate, though an educated estimate.
Based on the available data, a timing analysis will show whether the timing is in the ballpark. If the timing is way off, routing - a very compute-intensive and time-consuming step - makes no sense. It probably will be better to consider rearchitecturing the chip at this point in time to, at least, approach the desired timing. Of course, this decision is up to the designer.
Routing
Global routing (loose wiring), determines the regions traversed by
wiring. Detailed wiring determines specific location of wires.
Routing and its success is highly contingent on the floorplanning and the placement. Timing-driven routing is desired because of the challenges of DSM technologies. In addition to the space constraints on the router, this means the router has additional constraints at critical interconnects to be within certain delay limits once the routing is finished. Considering the complexity of the distributed RC load interconnects and the fact that standard routing is already compute-intensive, this may be difficult to do well. However, it is one of the possibilities with today's latest tools.
Parasitic Extraction
Now we arc at the point where we can determine all the information
necessary through a Layout Parasitic Extraction (LPE) to analyze the
exact timing. The data will generally be specified in DSPF (Detailed
Standard Parasitic Formal). Extraction is also a very compute-intensive
task. However, a lot depends on whether the layout data is
hierarchical. It also depends on whether the extraction can be
performed hierarchically, even for layout data that is hierarchical.
Hierarchy in layouts was discussed in Chapters 2 and 5. Complexity and
computation intensity also increase because, for DSM technologies, the
extraction in 3D is so important. We have seen in Chapter 3 how
significant the 3D effects arc and how they complicate things.
After the parasitic extraction, we can model the interconnects and can determine the liming of the chip lo see if we arc close to the desired timing parameters. Now we can decide, based on realistic data, which of the following situations we are facing:
The above steps arc really what is called Final "Synthesis" in the flow in Figure 7.2.
Fabrication and Test
What happens now really depends on what happened within the flow. It
depends a lot on the changes that had Lo be made to meet the timing
requirements of the design.
The big question is now: Have any of the required changes after the
third step in the flow (Functional Verification) affected the
functionality in any way, and how can we be sure that it did not? If
functional changes could have happened, both functional simulation and
ATPG (Automatic Test Pattern Generation) need to be redone. Such steps
would involve major investments in engineering effort and lime. Also,
test synthesis gels into the picture because it affects ihe timing (the
capacitive loading) of the design. It might be reasonable to wait with
ATPG until the physical design is complete. Test patterns are nol
needed until the end, anyway.
Thus, when it comes to circuits designed for DSM technologies, we
need to be vigilant about when we really know thai a chip is ready for
fabrication based on simulation results and which test vectors to use
to guarantee the required fault coverage. The functional simulation can
be done repeatedly in the flow with more and more definitive results.
After all, only the last functional simulation is the basis for a
signoff. Generating a good set of test vectors is very time-consuming,
but it needs to done with diligence and as late in the flow as possible.
The only problem with late test vector generation is discovering that
the present design can not guarantee the required coverage or the test
requires too many vectors and, therefore, too much time. Then, a
redesign with scan insertion may be needed, which will greatly change