7.3.1 DESIGN FROM SCRATCH

We will look at the steps required to design a DSM VLSI circuit and will use information describing these steps to understand the benefits of a flow in which these steps do not have to be taken when following the path of IP reuse.

Specification
At the specification level, the description what one "wants" may have one of many possible formats depending on needs and preferences. Although, the specification could start out with a human-language description, it will generally be translated immediately into a more precise formulation.

Coding Into High-Level Behavioral Language
Once the specification is clear and understood by everybody, the "intent" needs to be translated into a format that is technically doable. It needs to be coded. The desired description could be in a high-level behavioral language (HDL), a flow diagram or a state diagram, etc. If it is in Verilog or VHDL, cycle-to-cycle clock activity will not be defined at this point because it is strictly behavioral and contains no timing information. It also does have to be a synthesizable subset of Verilog or VHDL but it should adhere strictly to the IEEE standard.

If the specification is in HDL, a block diagram or a flow diagram, the behavioral code can be machine-generated to VHDL or Verilog. Because there is no link to any particular physical version, we still have the freedom to generate various functionally equivalent architectures. This is very efficient and extremely useful. However, machine-generated code does not contain commenting lines and the resulting code will be difficult to interpret. Since these days reuse comes in the "flavor of the month", this may be a serious drawback because it can make reuse, especially of archived code, difficult once the original designers have left the company. The alternative to machine-generated code is to generate the code "by hand". In the interest of reuse, guidelines such as those suggested in the RMM [1] should be followed.
Now, we may want to proceed with synthesis. If this is the case, the language in which the desired design is coded must be described in synthesizable subset of behavior-level VHDL or Verilog. However, there is still no "attachment" to an actual physical implementation. It is a behavioral description.

Functional Verification
This highest functional level description of the desired project is the direct result of a specification that may have been done in a human language such as English. Thus, translating this intent into a more technical, more mathematical format may add ambiguities. However, no matter how the functional intent was specified, it needs to be verified - and the earlier, the better. A functional simulation is needed to make sure the ultimate product eventually does what it needs to do, at least functionally. To verify this functional-level specification, we need a functional simulator. There are several on the market. So far, the designs do not include any timing information. As a result, any verification of the timing will have to be done later, although the desired speed has already been projected in the specification.

Synthesis
This initial phase of synthesis (in a new design) has to do without very much timing data. After checking the functionality at the behavior level, logic and test synthesis will be performed. As we pointed out, the coding must be a synthesizable subset of Verilog or VHDL. If the data describing the design is behavior-level VHDL or Verilog, it could be translated into an RTL code. The resulting RTL code is a preparation for synthesis. RTL code specifies clock-to-clock cycle operations. It is also becomes linked to an architecture as opposed to behavior-level synthesis. Thus, architectural trade-offs have to happen before this step.

In essence, synthesis is the same as going to the "parts department" (called the technology library in VLSI design), to find the physical building blocks with which to put our system together. The range of parts will generally run from large macros to standard cells to gates. The granularity is, of course, much finer for silicon compilers.
Since we are now at an implementation level, such as RTL code synthesized directly into gates or other building blocks, timing starts to play a role. We need to select those components from a technology library that have the proper timing characteristics for the entire chip. Since layout is so critical to DSM technologies, we need some estimates of the timing of interconnects. Since there is no physical layout, the only data available at this point for timing is generally referred to as "statistical wire-load model". This model is an attempt to specify some timing before any floorplanning or placement. Such statistical models have no relationship to the design under consideration. They are based on past designs. There are few of them and the technology is constantly and rapidly changing. This is like predicting stock prices based on past performance.
A better approach is often referred lo as "custom wire models". With these models, interconnect timing estimates are based on projected physical placements of the building blocks in the present chip, the chips that are actually being designed. No routing has been done, no extractions have been done. These models are beller lhan statistical wire-load models, but timing convergence is still highly unlikely. Since the routing of the interconnects has such a dramatic effect on timing, their accuracy is still seriously questionable.

Verification Between Levels of Abstraction
The design flow shown in Fig. 7.2 starts at a high level of abstraction, a behavioral or functional level, and proceeds towards a lower level of abstraction, eventually the physical layout. The translation between levels requires verification to make sure the initial intent is nol lost in the process. This might best be done with formal verification between levels where such a test is an equivalency test of logic functions.

Thus, the following steps are required at some levels and between levels of abstraction:

  1. Create the intended design at a certain level of abstraction.
  2. Verify the desired function at that level.
  3. Translate to the lower level.
  4. Verify consistency throughout these 3 steps.

As pointed out in 2, there needs to be verifications at some levels in the flow besides the verification of the "translation" between all the levels of abstraction. The highest level of verification is to check if the system designed does what we want it to do. This will be done first at the functional level.

Floorplanning and Placement
We are now in the early stages of the physical layout of a chip. Fig. 7.2 suggests that floorplanning, placement and routing are separate tasks. Ideally, these tasks should be done together, interactively.

This is not done in practice because each of these tasks is already extremely computer-intensive by itself. This is especially true for routing (discussed later). However, we will see in the discussion here that it is conceptually difficult to separate them because the end result depends so much on their working well together.

With floorplanning, one tries to get an idea early on of how major blocks are going to fit together, how the shape and aspect ratios of the various blocks will affect putting together the puzzle. A critical question is with what ease the blocks will interconnect. Are the connections of intercommunicating contacts close to each other or not? Many blocks might want to use feed-throughs to ease the task.

Feed-throughs are much more important for DSM VLSI chips than for earlier processes.

If the floorplanning is done with manual interaction, optical aids such as a rat's nest are used to get an indication of congestions and the path of major interconnects.

The placement actually places the various building blocks and determines, therefore, the dimensions such as the space available for the router to place the interconnects. The quality of a floorplan in conjunction with the spaces reserved in the placement for the router can make the difference between a good or bad route or a route that does not even complete. It also has a big effect on timing in DSM technology chips. After floorplanning, the relative positions of these blocks are set and we have a general idea about the lengths the interconnects will have.

Refined Synthesis
After floorplanning and placement, net loads are still estimated based on the placement and orientation of the blocks. Using these data, a more refined synthesis can now be performed. Net loads are back- annotated and a more informed selection of cells can be chosen with synthesis. Net and cell delays may be specified in a format such as SDF (Standard Delay Format), net resistances, physical cluster and location information via PDEF (Physical Data Exchange Format).

However, at this point, it is still only the active parts of the circuit that have accurate delays. The net delays are still an estimate, though an educated estimate.

Based on the available data, a timing analysis will show whether the timing is in the ballpark. If the timing is way off, routing - a very compute-intensive and time-consuming step - makes no sense. It probably will be better to consider rearchitecturing the chip at this point in time to, at least, approach the desired timing. Of course, this decision is up to the designer.

Routing
Global routing (loose wiring), determines the regions traversed by wiring. Detailed wiring determines specific location of wires.

Routing and its success is highly contingent on the floorplanning and the placement. Timing-driven routing is desired because of the challenges of DSM technologies. In addition to the space constraints on the router, this means the router has additional constraints at critical interconnects to be within certain delay limits once the routing is finished. Considering the complexity of the distributed RC load interconnects and the fact that standard routing is already compute-intensive, this may be difficult to do well. However, it is one of the possibilities with today's latest tools.

Parasitic Extraction
Now we arc at the point where we can determine all the information necessary through a Layout Parasitic Extraction (LPE) to analyze the exact timing. The data will generally be specified in DSPF (Detailed Standard Parasitic Formal). Extraction is also a very compute-intensive task. However, a lot depends on whether the layout data is hierarchical. It also depends on whether the extraction can be performed hierarchically, even for layout data that is hierarchical. Hierarchy in layouts was discussed in Chapters 2 and 5. Complexity and computation intensity also increase because, for DSM technologies, the extraction in 3D is so important. We have seen in Chapter 3 how significant the 3D effects arc and how they complicate things.

After the parasitic extraction, we can model the interconnects and can determine the liming of the chip lo see if we arc close to the desired timing parameters. Now we can decide, based on realistic data, which of the following situations we are facing:

  1. The timing is close enough for us to believe that, without exchanging cells and simply adjusting the physical layout dimensions of certain devices and interconnects in ihe layout, we can achieve the desired timing. Such adjustments are generally referred lo as IPOs (In Place Optimizations). The big question here is: Just how much can we change the timing this way? The answer is, assuming we use all the latest available knowledge, an amazing amount, as we have suggested in Chapter 3.
  2. The timing is "reasonably" off. We have at least four choices:
  3. If the timing is way off, the chip will probably be rearchitectured or worse, the spec needs to be reviewed.
  4. We could look lo another foundry and migrate the design to the new process rules. A more careful review of the processing technology may be in order. There may now be a faster one.

The above steps arc really what is called Final "Synthesis" in the flow in Figure 7.2.

Fabrication and Test
What happens now really depends on what happened within the flow. It depends a lot on the changes that had Lo be made to meet the timing requirements of the design.
The big question is now: Have any of the required changes after the third step in the flow (Functional Verification) affected the functionality in any way, and how can we be sure that it did not? If functional changes could have happened, both functional simulation and ATPG (Automatic Test Pattern Generation) need to be redone. Such steps would involve major investments in engineering effort and lime. Also, test synthesis gels into the picture because it affects ihe timing (the capacitive loading) of the design. It might be reasonable to wait with ATPG until the physical design is complete. Test patterns are nol needed until the end, anyway.

Thus, when it comes to circuits designed for DSM technologies, we need to be vigilant about when we really know thai a chip is ready for fabrication based on simulation results and which test vectors to use to guarantee the required fault coverage. The functional simulation can be done repeatedly in the flow with more and more definitive results. After all, only the last functional simulation is the basis for a signoff. Generating a good set of test vectors is very time-consuming, but it needs to done with diligence and as late in the flow as possible.
The only problem with late test vector generation is discovering that the present design can not guarantee the required coverage or the test requires too many vectors and, therefore, too much time. Then, a redesign with scan insertion may be needed, which will greatly change