Clock and Power Gating Techniques:

There are mainly two types of Power dissipation in CMOS Transistors.

1. Static Power dissipation :
Static Power dissipation is mainly caused due to leakage of Transistors. The leakage could be from any of the sources such as :

1. Gate Leakage through dielectric.
2. Subthreshold leakage when CMOS is off.
3. Junction leakage from source and drain diffusion.
4. Contention Current in ratioed circuits.

Pstat = (Isub + Igate + Icont + Ijunc) * V
Where, V  = Volatage
       I* = Various leakage Currents 

The static Power dissipation is inherent to the properties of Transistor and therefore its efficiency mainly depends on the type/technology of CMOS Transistor. As the Transistors are shrinking Static Power dissipation is increasing as Leakages are higher at smaller Technology nodes.

2. Dynamic Power dissipation:
Dynamic Power dissipation is caused due switching of Transistor i.e from 1->0 or 0 ->1. Also, it can be caused due to short circuit when both pMOS and nMOS are partially ON for very short time :

Pdyn = Psw + Psc
Where, Psw = Switching Power
       Psc = Short Circuit Power
Psw = a * C * V * V * f. 

Where,a = activity factor, 
      V = voltage,
      C = Capacitance,
      f = Frequency

As short-circuit power is often very small, its ignored in Pdyn calculations. Therefore Pdyn is directly proportional to frequency, Capacitance, activity factor & square of voltage. If we are able to reduce any of these factors dynamic power reduces in proportion.

The Total Power is combination of Static and dynamic Power and can be stated as :

Ptot = Pstat + Pdyn

In order to study understand how each Power dissipating factor can be reduced in Static and Dynamic Power dissipation, please refer to Chapter 5 Power of a book CMOS VLSI Design by Neil Weste & David Harris (Refer to Link below).

In a Front-end RTL design, Static Power and dynamic Power can be saved by efficient Power and clock gating techniques.

Power Gating :
In Power gating technique, the source of power to a logic block is turned-off temporarily whenever there is no logical processing needed or no activity is required. This is often done through a dedicated Power Management Unit inside the Design that provides various clock sources to different parts of the design. However, it should be noted that loss of power (e.g P1 domain) results into loss of data so important information such as FSM States and other important Firmware values should be stored (e.g Pon domain) somewhere so that it could be retrieved whenever Power is back and design block is functional again. This useful information is often shared in Power-retention Registers or a local RAM. These Registers and RAM retain power when rest of the logic is powered-off. There are also levels of depth of Power gating depending on the extent and length of the Time, the logic its supposed to be inactive. A good example would be Sleep vs Hibernate in a PC. Here’s an block diagram that gives an overview of Power Gating.

Power Gating Structural Hierarchy

Clock-gating :
Clock gating is a way reducing dynamic Power dissipation by temporary turning-off clock of the Flops on certain parts of the logic or by turning-off enable on gated Flops. In other words, Flops are turned-on only if there is valid information to be stored or transferred. The accuracy with which these clocks are Turned-off is captured by clock gating efficiency. The examples of these 2 types of mechanisms is shown below:

1) Clock Gating with Free running Clocks :

always_ff @(posedge clk or negedge reset) begin
  if(reset)
     Q <= 1'b0;
  else if (clk_en)
     Q <= D;
end


2) Clock Gating with Gated Clocks

always_latch  begin
 if(~clk) 
    clkg_en = enable;
end
   
assign gated_clk = clk & clkg_en;

always_ff @(posedge gated_clk or negedge reset) begin
  if(reset)
     Q <= 1'b0;
  else 
     Q <= D;
end
1) Flop with Clock Enable
power and clock gating
2) Flop with Gated Clock

As a generic guideline, if there a lot of flops in the logic that use same gating enable then its better to design a gated clock implementation. This 2nd implementation is well suited for data-flops and it also decreases Timing-risk on enable (clk_en) as enable signal does not need to travel to every gated flop. Also this type of implementation is able to provide glitch-free clocking. Backend Tools often convert create gated clocks by combining several flops that use same clock gating enable if clock optimization feature is Turned-on during Synthesis. There are also other Tools like PowerArtist that statically analyze RTL design and identify gated Flops, their efficiencies, overall Power of each block and potential flops that could be gated.
The 1st implementation is used where clock gating logic is much diverse and each or some of the flops need a separate functional clock enable. This type of clock gating is also used to hold or freeze the value of flops as a way to debug or Stall the Logic.
There are also levels in Clock gating similar to Power Gating. The first level of clock gating is called Trunk level level gating wherein clock to a a Top level of the design block could be shut-off. Then there is a Leaf level gating wherein parts of the modules could be gated individually while rest of the submodules are still ON.

Clock gating Structural Hierarchy

Clock & Power-down Overrides :
Power and clock overrides are used to ungate the clocks. There maybe cases where there maybe a functional issue that may cause Flops to be gated incorrectly. If such issues occur late in a Project where the Design is in convergence mode it may cause significant delays as Design needs to be re-synthesized after correction of bug. In order to avoid such delays Powerdown & Clock overrides act as backup feature to workaround the issue. These overrides also provide a way of Testing DFT (Design for Test) by allowing way to Scan the flops. The override mechanism is often enabled through Firmware Programming.