Lecture 3: CDC and Generate Constructs
Last Update: 2023-11-16
> Clock Domain Crossing
Clock domain crossing (CDC) refers to the point in a circuit where a signal needs to be transmitted from one “clock domain” to another. A clock domain is a region of a circuit which runs on a single clock. Usually, this means that it has its own frequency and phase which is independent from circuits in other regions of a design.
Why do we worry about CDC? Recall from lecture 1 that every flip flop has a setup and hold time that cannot be violated without the chance of it entering a metastable state. If the flop does enter a metastable state, then the length of time which it stays in this state is well modeled by a given probability distribution. For example, let’s say that we have some flip flop who’s mean metastability time is 1us. This means that on average, every time that the setup or hold time is violated, the flip flop will take on average 1us to transition to a stable state. If the clock feeding this flip flop is running at 1 MHz, 50% of the cycles where the flip flop is metastable will not have a valid output value by the time the next clock edge comes around. Figure 1 demonstrates this effect and how an invalid logic level may propagate chaos downstream.
Figure 1: Metastable-induced chaos
Obviously, this is not desired behavior. There are a number of ways that this issue can be resolved:
• Get a faster logic family (costs more)
• Decrease the clock rate (increase clock period) (reduces system performance)
• Put multiple flip flops in series (increases latency)
For this discussion, we will focus on the last option: putting multiple flip flops in series. How does this help? Remember that during the time a flip flop is metastable, the output voltage can vary. If this voltage reaches a point where it is near a valid logic level by the time the next clock edge comes around, then a second flip flop can catch this. By chaining multiple flip flops, the probability of having a metastable output on the last flip flop goes down proportionally.
Figure 2: Chaining multiple flip flops to “reduce” metastability time.
What does this have to do with CDC? Since each clock domain in a circuit may be operating at different frequencies and phases relative to each other, there is no guarantee that data output from one domain will meet the setup and hold time requirements for the receiving domain. The most basic of CDC circuits takes advantage of the multiple series flip flops to reduce the probability of a metastability failure. Figure 3 shows this arrangement along with a single flip flop on the transmitting clock domain just to ensure a clean edge entering the receiving domain.
Figure 3: Basic CDC synchronizer
Of course, this only works if the two clock domains’ clock frequencies are relatively close to each other. If the transmitting domain is significantly faster than the receiving domain, then the receiver may not catch the transmitting domain’s data. There are solutions to this problem which use feedback to handshake data, but those are beyond the scope of this course. If you are interested in more information about CDC, I highly recommend looking for paper “Clock Domain Crossing (CDC) Design & Verification Techniques Using SystemVerilog” by Clifford E. Cummings (Sunburst Design).
CDC: Verilog
How can we write this in Verilog? A quick attempt might be to do something along the lines of Listing 1.
module cdc_basic (
input clk_in, clk_out
input unsync_in,
output sync_out
);
reg input_flop;
reg [1:0] sync_flops;
// Input flip flop
always @(posedge clk_in) begin
input_flop <= unsync_in;
end
// Receiver synchronization chain
always @(posedge clk_out) begin
sync_flops <= { sync_flops[0], input_flop };
end
assign sync_out = sync_flops[1];
endmodule // cdc_basic
Listing 1: Basic CDC module
This creates a flip flop for the input and two flip flops for the synchronization chain. This works just fine, but we could parameterize it to make it more useful in future projects. Take a look at Listing 2. This makes use of a new Verilog construct, the “generate” block.
`timescale 1ns / 1ns
/**
* Clock Domain Crossing generic module
*
* Authors:
* * Electronics Tinkerer 2023-06-02
* * (Backported to Verilog 2023-11-02)
*
* **** USAGE ****
* PARAMETERS:
* STAGES - The number of synchronization flops on the output clock domain
* W - The width of the data to synchronize
*
* INPUTS:
* clk_in - Input clock
* d_in - Data which is synchronized to the clk_in clock domain
*
* OUTPUTS:
* clk_out - Output clock
* d_out - Data synchronized to the clk_out clock domain
*/
module cdc_generic
#(
parameter STAGES = 2,
parameter W = 1
) (
input wire clk_in, clk_out,
input wire [W-1:0] d_in,
output wire [W-1:0] d_out
);
reg [W-1:0] cdc_stages [0:STAGES-1];
reg [W-1:0] cdc_input;
always @(posedge clk_in) begin
cdc_input <= d_in;
end
generate
genvar ii;
for (ii = 0; ii < STAGES; ii = ii + 1) begin : cdc_stage_expansion
always @(posedge clk_out) begin
if (ii == 0) begin
cdc_stages[ii] <= cdc_input;
end
else begin
cdc_stages[ii] <= cdc_stages[ii-1];
end
end
end
endgenerate
assign d_out = cdc_stages[STAGES-1];
endmodule : cdc_generic
Listing 2: Introducing the “generate” loop
Let’s break this down. First are the input and output clocks. These are followed by parameterized width data input and data output ports. After the port declarations are two “reg” declarations. “cdc_input” looks like a typical width-parameterized net. “cdc_stages” on the other hand, has an extra “[0:STAGES-1]” after the net name. This syntax creates a 1-dimensional unpacked array of W-width values. This will be used to store each stage of the synchronizer chain later. Next is an always block to synthesize the input clock domain flip flops.
Finally, is the generate block. Generate blocks always begin with the keyword “generate” and end with “endgenerate.” Note that there is no trailing semicolon. Next, the induction variable for the generate loop is created, this is a “genvar.” To make it easier to locate variable during code refactoring, I have named the induction variable “ii”, but you could use something else if you like. Following this is the “for” loop. Each iteration of this loop will generate a new hardware construct. It is a little bit like a C macro, it does not run at “run time,” but rather is performed during compilation. Within the “for” is an always block that is generated for each iteration. Based on which if/else condition is matched for each generate loop, the synthesizer will either synthesize a data path from the input flip flops to the first in the synchronization chain or a path between each pair of steps in the synchronization chain.
After the generate block, there is a continuous assignment to grab the last stage of the synchronization chain and output it to the “d_out” port.
A good question is “why do we need to parameterize the length of the synchronization chain?” Each stage of the chin reduces the probability of failure (and thus the Mean Time Between Failures - MTBF). By adding more flip flops, this MTBF factor is greatly increased. For very high-speed design, it is not uncommon to see three or more stages. Remember that even a small probability of error can result in frequent data transmission errors. For example, a CDC block with an average probability of failure of just 0.00001% running at 1GHz will have 10,000 errors every second (on average).