Wednesday, March 16, 2011

TOP 10 VERIFICATION MYTHS

1. This is legacy code no need to verify it - Hold your horses! Are you 100% sure
that you’re dealing with silicon proven code? Are you sure that nobody’s touched
it since it last worked?
2. I can come up with a patch in 5 minutes - That’s OK as long as you make sure
you don’t end up with a verification environment that looks like a bunch of
patches hooked together. Ask yourself how easily it would be to modify or fix your
environment a week from today? Isn’t it worth it to take a couple of more minutes
and write a robust code?
3. Start checking and plan as you go - A big no no! Don’t be tempted into this.
Even if your task looks like a piece of cake, always plan in advance. You’ll be
amazed to see how many problems can be completely avoided. Remember the 5
P's: Proper Planning Prevents Poor Performance.
4. This is really simple, no need for a test plan - Consider your test plan document
as your working contract. Whatever you put in it defines the requirements for
your current work. If it’s a simple task then just write a test plan on half a page.
5. Verification is not the product, no need to keep software standards - True,
verification is not the product but still - when you’re dealing with thousands lines
of code, you’d better make sure there is a certain level of consistency, let alone
programming errors.
6. Don’t have time to add comments - Remember the last time you spent half a day
on reverse engineering somebody else’s code? How about your own code? Better
yet, start each test with comments explaining the steps in the test and keep them
up to date.
7. Oh I know! Let’s just force the signal from outside - Forced wires have this
tendency to get forgotten along the way and then reappear at a later stage,
usually a week before tapeout. So be extra careful.
8. Must have regression running in the background all the time - Regression
runs alone can’t do the job. You must have a Regression Sitter to monitor and
analyze the results or else - you’re just wearing out your servers.
9. We have reached 100% coverage there’s no point in running more tests - Not
really. Your coverage model can only capture what you thought about in advance.
Obviously a random test bench is capable of generating additional scenarios that
might reveal a bug, so don’t stop at 100%. Instead - enhance the functional
coverage model.
10. Verifiers should be looking for bugs - This is a common misconception of what
verification is all about. Verifiers should be focused on building a well
constructed, robust and complete test bench. Bugs will come out on their own.

Thursday, March 10, 2011

Difference between Initial block and Final block in SV

Final block is a new concept which was introduced in System Verilog.

The basic difference between these two are evident from the nomenclature, i.e, Initial block starts getting executed during simulation time t=0 while the Final block gets executed when the simulation is completed.

Before getting into details, there is one similarity between these two sequential block of codes, both of them gets executed only once during the simulation

Now getting back to the difference between Initial and Final blocks, Initial blocks can contain some # delays or wait statements or some wait for events, but the Final block should not contains any such things.

Final block should get executed with 0 simulation time. Ideally this is used for test case status reporting or some display statements that have to be printed after the test case execution is completed

Wednesday, March 25, 2009

Understanding Clock Domain Crossing Issues

Introduction

SoCs are becoming more complex these days. A lot of functionality is being added to chips and data is frequently transferred from one clock domain to another. Hence, clock domain crossing verification has become one of the major verification challenges in deep submicron designs.

A clock domain crossing occurs whenever data is transferred from a flop driven by one clock to a flop driven by another clock.


1. Clock domain crossing.

In Figure 1, signal A is launched by the C1 clock domain and needs to be captured properly by the C2 clock domain. Depending on the relationship between the two clocks, there could be different types of problems in transferring data from the source clock to the destination clock. Along with that, the solutions to those problems can also be different.

Traditional methods like simulation and static timing analysis alone are not sufficient to verify that the data is transferred consistently and reliably across clock domains. Hence, new verification methodologies are required, but before devising a new methodology it is important to understand the issues related to clock domain crossings properly. Different types of clock domain crossings are discussed here along with the possible issues encountered in each one of them and their solutions. A new verification methodology is then proposed which will ensure that data is transferred correctly across clock domains.

In all the subsequent sections, the signal names shown in Figure 1 are directly used. For example, C1 and C2 imply the source and destination clocks respectively. Similarly A and B are used as source and destination flop outputs respectively. Also, the source and destination flops are assumed to be positive edge triggered.

Clock Domain Crossing Issues
This section describes three main issues which can possibly occur whenever there is a clock domain crossing. The solutions for those issues are also described.

A. Metastability
Problem. If the transition on signal A happens very close to the active edge of clock C2, it could lead to setup or hold violation at the destination flop "FB". As a result, the output signal B may oscillate for an indefinite amount of time. Thus the output is unstable and may or may not settle down to some stable value before the next clock edge of C2 arrives. This phenomenon is known as metastability and the flop "FB" is said to have entered a metastable state.

Metastability in turn can have the following consequences from a design perspective:

  1. If the unstable data is fed to several other places in the design, it may lead to a high current flow and even chip burnout in the worst case.
  2. Different fan-out cones may read different values of the signal, and may cause the design to enter into an unknown functional state, leading to functional issues in the design.
  3. The destination domain output may settle down to the new value or may return to the old value. However, the propagation delay could be high leading to timing issues.
For example, see Figure 2. If the input signal A transitions very close to the posedge of clock C2, the output of the destination flop can be metastable. As a result it can be unstable and may finally settle to 1 or 0 as depicted by signals B1 and B2.


2. Metastability has consequences.

Solution. Metastability problems can be avoided by adding special structures known as synchronizers in the destination domain. The synchronizers allow sufficient time for the oscillations to settle down and ensure that a stable output is obtained in the destination domain. A commonly used synchronizer is a multi-flop synchronizer as shown in Figure 3.


3. Multi-flop synchronization.

This structure is mainly used for single and multi-bit control signals and single bit data signals in the design. Other types of synchronization schemes are required for multi-bit data signals such as MUX recirculation, handshake, and FIFO.

B. Data Loss

Problem. Whenever a new source data is generated, it may not be captured by the destination domain in the very first cycle of the destination clock because of metastability. As long as each transition on the source signal is captured in the destination domain, data is not lost. In order to ensure this, the source data should remain stable for some minimum time, so that the setup and hold time requirements are met with respect to at least one active edge of destination clock.

If the active clock edges of C1 and C2 arrive close together, the first clock edge of C2, which comes after the transition on source data A, is not able to capture it. The data finally gets captured by the second edge of clock C2 (Figure 4).

However, if there is sufficient time between the transition on data A and the active edge of clock C2, the data is captured in the destination domain in the first cycle of C2.


4. Effect of metastability on data capture.

Hence, there may not be a cycle by cycle correspondence between the source and destination domain data. Whatever the case, it is important that each transition on the source data should get captured in the destination domain.

For example: Assume that the source clock C1 is twice as fast as the destination clock C2 and there is no phase difference between the two clocks. Further assume that the input data sequence "A" generated on the positive edge of clock C1 is "00110011". The data B captured on the positive edge of clock C2 will be "0101". Here, since all the transitions on signal A are captured by B, the data is not lost. This is depicted in Figure 5.


5. No data is lost in this case.

However, if the input sequence is "00101111", then the output in the destination domain will be "0011". Here the third data value in the input sequence which is "1" is lost as shown in Figure 6.


6. Data is lost in this case.

Solution. In order to prevent data loss, the data should be held constant in the source domain long enough to be properly captured in the destination domain. In other words, after every transition on source data, at least one destination clock edge should arrive where there is no setup or hold violation so that the source data is captured properly in the destination domain. There are several techniques to ensure this.

For example, a finite state machine (FSM) can be used to generate source data at a rate, such that it is stable for at least 1 complete cycle of the destination clock. This can be generally useful for synchronous clocks when their frequencies are known. For asynchronous clock domain crossings, techniques like handshake and FIFO are more suitable.

C. Data Incoherency
Problem. As seen in the previous section whenever new data is generated in the source clock domain, it may take 1 or more destination clock cycles to capture it, depending on the arrival time of active clock edges. Consider a case where multiple signals are being transferred from one clock domain to another and each signal is synchronized separately using a multi-flop synchronizer. If all the signals are changing simultaneously and the source and destination clock edges arrive close together, some of the signals may get captured in the destination domain in the first clock cycle while some others may be captured in the second clock cycle by virtue of metastability. This may result in an invalid combination of values on the signals at the destination side. Data coherency is said to have been lost in such a case.

If these signals are together controlling some function of the design, then this invalid state may lead to functional errors.

For example: Assume that "00" and "11" are two valid values for a signal X[0:1] generated by clock C1. As shown in Figure 7, initially there is a transition from 1->0 on both the bits of X. Both the transitions get captured by clock C2 in the first cycle itself. Hence the signal Y[0:1] becomes "00".


7. Data coherency is lost in this case.

Next, there is a transition from 0->1 on both the bits of signal X. Here the rising edge of clock C2 comes close to the transition on signal X. While the transition on X[0] is captured in the first clock cycle, the transition on X[1] gets captured in second clock cycle of C2. This results in an intermediate value of "10" on Y[0:1] which is an invalid state. Data coherency is lost in this case.

Solution. In the above example, the problem results because all the bits are not changing to a new state in the same cycle of destination clock. If all the bits either retain their original value or change to the new value in the same cycle, then the design either remains in the original state or goes to a correct new state.

Now, if the circuit is designed in such a way that while changing the design from one state to another, only one bit change is required, then either that bit would change to a new value or would retain the original value. Since all the other bits have the same value in both the states, the complete bus will either change to the new value or retain the original value in this case.

This in turn implies that if the bus is Gray-encoded, the problem would get resolved and an invalid state would never be obtained.

However, this is applicable only for control busses as it may not be possible to Gray-encode the data busses. In such cases, other techniques like handshake, FIFO and MUX recirculation can be used to generate a common control logic to transfer data correctly.
The MUX recirculation technique is shown in Figure 8.


8. MUX recirculation technique.

Click here for a larger version

Here, a control signal EN, generated in the source domain is synchronized in the destination domain using a multi-flop synchronizer. The synchronized control signal EN_Sync drives the select pin of the muxes, thereby controlling the data transfer for all bits of the bus A. In this way, individual bits of the bus are not synchronized separately, and hence there is no data incoherency. However, it is important to ensure that when the control signal is active, the source domain data A[0:1] should be held constant.



Synchronous Clock Domain Crossings

This section describes various types of synchronous clock domain crossings. Clocks which have a known phase and frequency relationship between them are known as synchronous clocks. These are essentially the clocks originating from the same clock-root. A clock crossing between such clocks is known as a synchronous clock domain crossing. It can be divided into several categories based on the phase and frequency relationship of the source and destination clocks as follows:

  • Clocks with the same frequency and zero phase difference
  • Clocks with the same frequency and constant phase difference
  • Clocks with different frequency and variable phase difference
    • Integer multiple clocks
    • Rational multiple clocks
All the above sub categories may not be used in real designs but are being considered here for completeness and better understanding of the subject.

While describing all the above cases, it is assumed that the source clock (C1) and the destination clock (C2) have the same phase and frequency jitter and are balanced with the same specifications of clock latency and skew. It is also assumed that the clocks begin with a zero phase difference between them and the "clock to Q" delay of the flops is 0.

Clocks with the same frequency and zero phase difference
This refers to two identical clocks, as the clocks C1 and C2 have the same frequency and 0 phase difference. Note, that as the clocks C1 and C2 are identical and generated from the same root clock, the data transfer from C1 to C2 is essentially not a clock domain crossing. For all practical purposes, this is the case of a single clock design and is considered here for completeness.

Whenever data is transferred from clock C1 to C2, one complete clock cycle of C1 (or C2) is available for data capture as shown in Figure 9.


9. Clocks with the same frequency and phase.

As long as the combinational logic delay between the source and destination flops is such that the setup and hold time of the circuit can be met, the data will be transferred correctly. The only requirement here is that the design should be STA (static timing analysis) clean. In that case, there will be no problem of metastability, data loss or data incoherency.

Clocks with the same frequency and constant phase difference
These are the clocks having the same time period but a constant phase difference. A typical example is the use of a clock and its inverted clock. Another example is a clock which is phase shifted from its parent clock, for example by T/4 where T is the time period of the clocks.

Clocks C1 and C2 have the same frequency but are phase shifted and C1 is leading C2 by 3T/4 time units (Figure 10).


10. Same frequency, phase shifted clocks.

Whenever data is transferred from clock C1 to C2, there is more restriction on the combinational logic delay due to smaller setup/hold margins. If the logic delay at the crossing is such that the setup and hold time requirements can be met, data will be transferred properly and there will be no metastability. In all such cases, there is no need for a synchronizer. The only requirement here is that the design should be STA clean.

Clocks with different frequency and variable phase difference
These are clocks which have a different frequency and a variable phase difference. There can be two sub-categories here, one where the time period of one clock is an integer multiple of the other and a second where the time period of one clock is a non-integer (rational) multiple of the other. In both cases, the phase difference between the active edges of clocks is variable. These two cases are described in detail below.

A. Integer multiple clocks
In this case, the frequency of one clock is an integer multiple of the other and the phase difference between their active edges is variable. Here the minimum possible phase difference between the active edges of 2 clocks would always be equal to the time period of the fast clock.

In Figure 11 clock C1 is 3 times faster than clock C2. Assuming T is the time period of clock C1, the time available for data capture by clock C2 could be T, 2T or 3T depending on which edge of clock C1 the data is launched. Hence, the worst case delay of any path should meet the setup time with respect to the edge with a phase difference of T. The worst case hold check would be made with respect to the edge with 0 phase difference.


11. Integer multiple clocks.

In all such cases, one complete cycle of the faster clock is always available for data capture, hence it should always be possible to meet the setup and hold requirements. As a result there will be no metastability or data incoherency and a synchronizer is not needed.

However, there can still be a problem of data loss in the case of fast to slow clock crossing. (That is, the source clock is faster than the destination clock.) In order to prevent this, the source data should be held constant for at least one cycle of the destination clock. This can be ensured by using some control circuit, for example, a simple finite state machine (FSM) would work in this case. In the example shown in Figure 11, if the source data is generated once in every 3 cycles of the source clock, there would be no data loss.
For the case of slow to fast crossings, there will anyways be no data loss.

B. Rational multiple clocks
In this case, the frequency of one clock is a rational or non-integer multiple of the other clock and the phase difference between the active clock edges is variable.

Unlike the situation where one clock is an integer multiple of the other, here the minimum phase difference between the two clocks can be very small- small enough to cause metastability. Whether or not a metastability problem will occur depends on the value of the rational multiple, and the design technology. Three different cases are being considered here with the help of examples.

In the first case, there is a sufficient phase difference between the active edges of the source and destination clocks such that there will be no metastability.

In the second case, the active clock edges of the two clocks can come very close together, close enough to cause metastability problem. However, in this case the frequency multiple is such that, once the clock edges come close together, there would be sufficient margin in the next cycle to capture data properly without any setup or hold violation.

In the third case, the clock edges of the two clocks can be close enough for many consecutive cycles. This is similar to the behavior of asynchronous clocks except that here the clock-root for both the clocks is the same and hence the phase difference between the clocks can be calculated.

Note that in all the examples given below, some delay values are used and it is assumed that a phase margin of less than or equal to 1.5ns between the clock edges can cause metastability. This is just a placeholder value and in real designs, it would be a function of many things including technology used, flop characteristics, etc.

Example 1
This is the case when the active clock edges of both the clocks will never come very close together, and in all cases there would be a sufficient margin to meet the setup and hold requirements of the circuit.

Consider a clock C from which 2 clocks C1 and C2 are derived with a frequency of divide-by-3 and divide-by-2 respectively with respect to clock C. Here clock C1 is 1.5 times slower than clock C2. As shown in Figure 12, the time period of clock C1 is 15ns and of C2 is 10ns. The least possible phase difference between the two clock edges is 2.5ns which should be sufficient to meet setup and hold time requirements.


12. Clock edges never come very close together.

However, additional combinational logic should not be added at the crossing due to the very small setup/hold margins. If there is any logic, its delay should meet the setup and hold time requirements. If this condition can be met, there will be no metastability and no synchronizer would be required.

Further, if the crossing is a slow to fast crossing, there will be no data loss. However, in case of a fast to slow clock crossing, there can be data loss. In order to prevent this, the source data needs to be held constant for at least one cycle of the destination clock so that at least one active edge of the destination clock arrives between two consecutive transitions on the source data.

Example 2
In this case, the active clock edges of both the clocks can come very close together intermittently. In other words, the clock edges come close together once and then there would be sufficient margin between the edges for the next few cycles (to capture data properly) before they come close again. Here the word "close" implies close enough to cause metastability.

In Figure 13, clocks C1 and C2 have time periods 10ns and 7ns respectively. Notice, that the minimum phase difference between the two clocks is 0.5ns, which is very small. So, there are chances of metastability and a synchronizer would be required.

Due to metastability, the data may not be captured in the destination domain when the clock edges are very close together. However, in this case, note that once the clock edges come very close together, in the next cycle there is a sufficient margin so that the data can be captured properly by the destination clock. This is shown by signal B2 in Figure 13. While the expected output would be B1, the actual waveform could look like B2, but still there is no data loss in this case. However there can be an issue of data incoherency as described previously.


13. Clock edges come close together intermittently.

For a fast to slow crossing, data loss can occur, and in order to prevent this, the source data should be held constant for a minimum of one destination clock cycle. Again, this can be done by the use of a simple FSM.

Example 3
This is the case when the phase difference between the clocks can be very small at times and can remain like that for several cycles. This is very similar to asynchronous clocks except that the variable phase differences will be known and will repeat periodically.

In Figure 14, clocks C1 and C2 have time periods 10ns and 9ns respectively. It can be seen that the active clock edges of both the clocks come very close together for 4 consecutive cycles. In the first two cycles there is a possibility of a setup violation (as the source clock is leading the destination clock) and in the next two cycles there is a possibility of hold violation (as the destination clock is leading the source clock).


14. Clock edges are close for consecutive cycles.

Click here for a larger version

In this case, there will be an issue of metastability and hence synchronization needs to be done. Apart from metastability there can be an issue of data loss also, even though it is a slow to fast clock domain crossing. As can be seen from Figure 14, B1 is the expected output if there would have been no metastability. However, the actual output can be B2. Here the data value '1' is lost, because in the first cycle the value '1' is not captured due to setup violation and in the second cycle the new value '0' is incorrectly captured due to hold violation.

In order to prevent data loss, the data needs to be held constant for a minimum of two cycles of the destination clock. This is applicable for both fast to slow as well as slow to fast clock domain crossings. This can be done by controlling the source data generation using a simple FSM. However, the data incoherency issue can still be there.

In such cases, standard techniques like handshake and FIFO are more useful to control data transfer as they will also take care of the data incoherency issue.



Asynchronous Clock Domain Crossings
Clocks which do not have a known phase or frequency relationship between them are known as asynchronous clocks. Whenever there is a clock crossing between two asynchronous clocks, their active edges can arrive very close together leading to metastability. Here the phase difference between the clocks can be variable and unlike synchronous clocks it is unpredictable.

Proper synchronization needs to be done in the destination domain to prevent metastability. Apart from that, there can be problems of data loss and data incoherency (in both fast to slow as well as slow to fast clock crossings). If the source and destination clock frequencies are known, data loss can be prevented by holding the source data constant for two cycles of the destination clock. However, if the circuit is to be designed to be independent of clock frequencies, handshake or FIFO techniques should be used to prevent metastability, data loss and data incoherency.

Verification Methodology

This section describes a methodology which will ensure that the circuit has been designed properly to handle the clock domain crossing issues.

The validation activity can be divided into two categories, namely structural and functional. Structural validation ensures that appropriate synchronization logic has been added wherever it is required and functional validation ensures that the logic which has been added is actually performing the intended function.

A number of CDC problems can be detected just by performing structural validation. These checks are simpler and much faster than the functional validation. Moreover, if there are structural issues, most of the functional validation would have no relevance anyway. Hence, verification should begin with the structural checks and the problems detected there should be corrected before moving on to functional validation.

Rule-based checking is a very efficient way to perform structural validation.
Assertion-based verification techniques can be used to perform functional validation. Assertions can be inferred automatically in the design using some EDA tools, or they can be inserted in the RTL using any of the standard assertion languages like OVL, PSL and SVA. These languages are supported by many EDA vendors.

These assertions can either be simulated in the functional simulation environment or can be verified using formal verification techniques. Both these techniques have their own advantages and disadvantages.

The simulation results are dependent on the quality of test vectors used. A problem may go undetected if the vectors applied cannot stimulate it, and it is very difficult to determine the right set of test vectors which will give good coverage.

As compared to simulation, formal techniques give a much better coverage and there is no need to provide any test vectors. However, formal techniques have some performance issues because of state space explosion which is a well known problem in formal analysis (see reference 4). So, these checks are not suitable for full chip analysis but they work reasonably well at the block level.

A step-by-step approach for verifying clock domain crossings is described here.

Step 1
Check for the presence of valid synchronizers in:
All asynchronous clock domain crossings, and,
Those cases of synchronous clock domain crossings where there can be metastability as described in the section on rational multiple clocks.

A multi-flop synchronizer is sufficient to ensure that there will be no metastability. However, there can still be a problem of data incoherency. So, it is advisable to check at this stage only, that multi-flop synchronizers are used only for scalar signals. They can also be used for control busses. They should not be used for data busses however.

A rule-based checker can be used to automatically detect all clock domain crossings and to check for the presence of valid synchronizers at all places where they are required.

If there are missing synchronizers, the designer should modify the design to add appropriate synchronization logic.

Step 2
Check for the presence of separately synchronized signals which are converging. These are probable candidates for data incoherency. These candidates can be identified by doing structural analysis of the design.

The candidate signals for data incoherency should be verified to be Gray-encoded. This validation can be done through assertions. The assertion itself could even be generated by a structural checking tool " whenever it sees signals which are candidates for data incoherency. Figure 15 shows a control bus clock domain crossing, which is synchronized using a multi-flop synchronizer but is not Gray-encoded. A waveform trace is generated for the assertion failure.


15. Formal verification helps catch gray-encoding failure.

In case the converging signals cannot be Gray-encoded, change the synchronization scheme to one which uses a common control signal, for example, MUX recirculation, FIFO or handshake. These schemes still need to be validated for proper functionality as described in Step 4.

Step 3
Once the proper synchronization logic is in place and the Gray-encoding checks have been done, the next step is to verify that there is no data loss while transferring data from one clock domain to the other. This needs to be checked for the following two cases:

  • Synchronous clock domain crossings
    • All fast to slow crossings
    • Slow to fast crossings where the clock edges can be close together for continuous cycles
  • All asynchronous clock domain crossings
These can be validated by asserting that each source data launch is always captured in the destination domain.

In the case of fast to slow synchronous clock domain crossings, where a synchronizer is not required and for the simple cases of multi-flop synchronization, check that after every transition on the source data an active edge of the destination clock arrives where there is no setup or hold violation.

For other synchronization schemes, some standard functional checks can be done to ensure that there is no data loss, which are described in Step 4.

Step 4
In all cases, where some special synchronization schemes are used, it is necessary to verify that they are performing the intended function correctly. This is important to ensure that there will be no metastability, data incoherency or data loss problem.

The required checks are given here for three commonly used schemes:

  • Handshake synchronization: Check that the request-data and request-acknowledge protocols are working as per the specifications.
  • FIFO synchronization: Check that there is no FIFO overflow or underflow.
  • Mux recirculation: With reference to Figure 8, check that while the synchronized control signal EN_Sync is active, the following two conditions hold:
    • Source data A[0:1] is stable, and,
    • at least one active edge of destination clock arrives
The methodology described in the above four steps is also depicted in Figure 16.



16. The flow of the verification methodology. Click here for a larger version

Summary
Traditional verification methods like simulation and static timing analysis are not sufficient to detect all types of problems which can occur in clock domain crossings. The problems which can occur depend on the types of clock domain crossings. Similarly, the solutions to those problems are also different and hence the verification techniques required are different as well. Some of the basic problems of clock domain crossings have been discussed here. The solutions to those issues are also discussed and a verification methodology is proposed which will ensure that data is correctly transferred across clock domains.

References

[1] Sanjay Churiwala, "Tackling multiple clocks in SoCs", EE Times March 15, 2004.
[2] Shaker Sarwary, "Solving the toughest problems in CDC analysis", EE Times August 28, 2006.
[3] http://www.asic-world.com/tidbits/metastablity.html
[4]K. McMillan, Symbolic Model Checking, Kluwer Academic Publishers, Boston, 1993.

About the Authors:
Saurabh Verma
Is an engineering manager at Atrenta. He has a bachelor degree from Indian Institute of Technology, Kanpur. He can be reached at: Verma
Ashima S. Dabare Ashima is a Consulting Applications Engineer at Atrenta. She has a masters degree from Indian Institute of Technology, Delhi. She can be reached at: Dabare.

Introduction to Metastability

Any asynchronous input from the outside world to a clocked circuit represents
a source of unreliability, since there is always some residual probability
that the clocked circuit will sample the asynchronous signal just at the
time that it is changing.

From a specification point of view, synchronous elements such as flip flops
specify a Setup time and a Hold time. By its nature an
asynchronous input cannot be reliably expected to meet this specification, and
so it will have transitions that fall within the timing window that is bounded
by these two specifications. When this occurs, the result can be one of three
scenarios:

1) The state of the signal prior to the transition is used.
2) The state of the signal after the transition is used.
3) The flip flop goes metastable.

The first two possibilities are of no consequence, since the signal is
asynchronous, but the third possibility is what the rest of this article is
about.

Metastability caused havoc in synchronous systems. It is caused by the unstable
equilibrium state for example when a pair of cross coupled CMOS inverters are
stuck at mid-voltages. It is impossible to determine how long such a state
persists. Unfortunately, due to the complexities in today's systems, it is not
possible for the designer to avoid this type of situation.

The most common approach to minimizing the problems of metastability propagating
into our synchronous systems is to use a synchronizing circuit to take the
asynchronous input signal, and align it to the timing regimen of the rest of the
system.

The synchronizer though can go metastable itself, and the goal of a designer is
to minimize the probability of this occuring and propagating to the output of
the synchronizer. In current (2004) technology, this can usually be achieved
with a two stage or three stage synchronizer.

Tuesday, September 16, 2008

Verilog Questions

Q: What is the difference between a Verilog task and a Verilog function?

A:

The following rules distinguish tasks from functions:

1. A function shall execute in one simulation time unit

A task can contain time-controlling statements.


2. A function cannot enable a task

A task can enable other tasks or functions.


3. A function shall have at least one input type argument and shall not have an output or inout type argument;

A task can have zero or more arguments of any type.


4. A function shall return a single value;

A task shall not return a value.


Q: Given the following Verilog code, what value of "a" is displayed?

always @(clk) begin

a = 0;

a <= 1;

$display(a);

end

A:


This is a tricky one! Verilog scheduling semantics basically imply a four-level deep queue for the current simulation time:

1: Active Events (blocking statements)

2: Inactive Events (#0 delays, etc)

3: Non-Blocking Assign Updates (non-blocking statements)

4: Monitor Events ($display, $monitor, etc).

Since the "a = 0" is an active event, it is scheduled into the 1st "queue". The "a <= 1" is a non-blocking event, so it's placed into the 3rd queue. Finally, the display statement is placed into the 4th queue. Only events in the active queue are completed this sim cycle, so the "a = 0" happens, and then the display shows a = 0. If we were to look at the value of a in the next sim cycle, it would show 1.

Q: Given the following snippet of Verilog code, draw out the waveforms for clk and a

always @(clk) begin

a = 0;

#5 a = 1;

end

A:

10 30 50 70 90 110 130

___ ___ ___ ___ ___ ___ ___

clk ___| |___| |___| |___| |___| |___| |___| |___

a ___________________________________________________________

This obviously is not what we wanted, so to get closer, you could use "always @ (posedge clk)" instead, and you'd get

10 30 50 70 90 110 130

___ ___ ___ ___ ___ ___ ___

clk ___| |___| |___| |___| |___| |___| |___| |___

___ ___

a _______________________| |___________________| |_______

Q: What is the difference between the following two lines of Verilog code?

#5 a = b;

a = #5 b;

A:

#5 a = b; Wait five time units before doing the action for "a = b;".

The value assigned to a will be the value of b 5 time units hence.

a = #5 b; The value of b is calculated and stored in an internal temp register.

After five time units, assign this stored value to a.

Q: What is the difference between:

c = foo ? a : b; and

if (foo) c = a;

else c = b;

A:

The ? merges answers if the condition is "x", so for instance if foo = 1'bx, a = 'b10, and b = 'b11, you'd get c = 'b1x.

On the other hand, if treats Xs or Zs as FALSE, so you'd always get c = b.

Q: Using the given, draw the waveforms for the following versions of a (each version is separate, i.e. not in the same run):

reg clk;

reg a;

always #10 clk = ~clk;

(1) always @(clk) a = #5 clk;

(2) always @(clk) a = #10 clk;

(3) always @(clk) a = #15 clk;

Now, change a to wire, and draw for:

(4) assign #5 a = clk;

(5) assign #10 a = clk;

(6) assign #15 a = clk;

A:

10 30 50 70 90 110 130

___ ___ ___ ___ ___ ___ ___

clk ___| |___| |___| |___| |___| |___| |___| |___

___ ___ ___ ___ ___ ___ ___

(1)a ____| |___| |___| |___| |___| |___| |___| |_

___ ___ ___ ___ ___ ___ ___

(2)a ______| |___| |___| |___| |___| |___| |___|

(3)a __________________________________________________________

Since the #delay cancels future events when it activates, any delay over the actual 1/2 period time of the clk flatlines...

With changing a to a wire and using assign, we just accomplish the same thing...

10 30 50 70 90 110 130

___ ___ ___ ___ ___ ___ ___

clk ___| |___| |___| |___| |___| |___| |___| |___

___ ___ ___ ___ ___ ___ ___

(4)a ____| |___| |___| |___| |___| |___| |___| |_

___ ___ ___ ___ ___ ___ ___

(5)a ______| |___| |___| |___| |___| |___| |___|

(6)a __________________________________________________________

Friday, September 5, 2008

Rules for govering usage of a verilog Function

The following rules govern the usage of a Verilog function construct:
  • A function cannot advance simulation-time, using constructs like #, @.etc.
  • A function shall not have nonblocking assignments.
  • A function without a range defaults to a one bit reg for the return value.
  • It is illegal to declare another object with the same name as the function in the scope where the function is declared

Synthesis Questions

  • What are the various Design constraints used while performing Synthesis for a design?
    Ans: 1. Create the clocks (frequency, duty-cycle).
    2. Define the transition-time requirements for the input-ports
    3. Specify the load values for the output ports
    4. For the inputs and the output specify the delay values(input delay and ouput delay), which are already consumed by the neighbour chip.
    5. Specify the case-setting (in case of a mux) to report the timing to a specific paths.
    6. Specify the false-paths in the design
    7. Specify the multi-cycle paths in the design.
    8. Specify the clock-uncertainity values(w.r.t jitter and the margin values for setup/hold).
    19. Specify few verilog constructs which are not supported by the synthesis tool.


  • What are the various design changes you do to meet design power targets?
    Ans: Design with Multi-VDD designs, Areas which requires high performance, goes with high VDD and areas which needs low-performance are working with low Vdd's, by creating Voltage-islands and making sure that appropriate level-shifters are placed in the cross-voltage domains Designing with Multi-Vt's(threshold voltages), areas which require high performance, goes with low Vt, but takes lot of leakage current, and areas which require low performance with high Vt cells, which has low leakage numbers, by incorporating this design process, we can reduce the leakage power. As in the design , clocks consume more amount of power, placing optimal clock-gating cells, in the design and controlling them by the module enable's gives a lot of power-savings.
    As clock-tree's always switch making sure that most number of clock-buffers are after the clock-gating cells, this reduces the switching there by power-reduction.
    Incorporating Dynamic Voltage and Frequency scaling (DVFS) concepts based on the application , there by reducing the systems voltage and frequency numbers when the application does not require to meet the performance targets. Ensure the design with IR-Drop analysis and ground-bounce analysis, is with-in the design specification requirement. Place power-switches, so that the leakage power can be reduced. related information.


  • what is meant by Library Characterizing
    Ans: Characterization in terms of delay, power consumption,..

  • what is meant by wireload model
    Ans: In the synthesis tool, in order to model the wires we use a concept called as "Wireload models", Now the question is what is wireload models: Wireload models are statistical based on models with respect to fanout. say for a particular technology based on our previous chip experience we have a rough estimate we know if a wire goes for "n" number of fanin then we estimate its delay as say "x" delay units. So a model file is created with the fanout numbers and corresponding estimated delay values. This file is used while performing Synthesis to estimate the delay for Wires, and to estimate the delay for cells, technology specific library model files will be available

  • what are the measures to be taken to design for optimized area
    Ans: As silicon real-estate is very costly and saving is directly propotional to the company's revenue generation lot of emphasize is to design which has optimial utilization in the area-front. The steps to reduce area are
    If the path is not timing-critical, then optimize the cells to use the low-drive strength cells so that there will saving in the area. Abut the VDD rows Analyzing the utilization numbers with multiple floor-planning versions which brings up with optimized area targets.

  • what all will you be thinking while performing floorplan
    Ans: Study the data-flow graph of the design and place the blocks accordingly, to reducing the weighted sum of area, wire-length. Minimize the usuage of blocks other-than square shapes, having notches Place the blocks based on accessibility/connectivity, thereby reducing wire-length. Abut the memory, if the pins are one-sided, there-by area could be reduced. If the memory communicates to the outside world more frequently , then placing at the boundary makes much of a sense. Study the number of pins to be routed, with the minimum metal width allowed , estimate the routability issues. Study the architecture and application , so that the blocks which will be enabled should be scattered, to reduce the power-ground noise.


  • what are the measures in the Design taken for Meeting Signal-integrity targets
    Ans: As more and more devices are getting packed, results in more congested areas, and coupling capactiances dominating the wire-capacitance, creates SI violations. Let's see now by what are all the measures we can reduce/solve it.
    As clock-tree runs across the whole chip, optimizing the design for SI, is essential route the clock with double-pitch and triple spacing. In-case of SI violation, spacing the signal nets reduces cross-talk impacts.
    Shield the nets with power-nets for high frequency signal nets to prevent from SI.
    Enable SI aware routing , so that the tool takes care for SI
    Ensure SI enabled STA runs, and guarantee the design meeting the SI requirements
    Route signals on different layers orthogonal to each other
    Minimize the parallel run-length wires, by inserting buffers.