

# Clock gating for low power circuit design by Merge and split methods

# K.Hariharan,

Asstt Prof., department of electronics and communication engg, SASTRA UNIVERSITY, Thanjavur, Tamilnadu, India

# C. JayaKumar,

M.Tech VLSI Design, School of Computing (SOC), SASTRA UNIVERSITY, Thanjavur, Tamilnadu, India

#### ABSTRACT

In present VLSI technology energy dissipation is an important factor to be considered among other factors like area, speed and performance in portable devices. The size reduction and complexity of portable devices have resulted in large amount of power dissipation in the devices. As a result low power designs have become inevitable part of today's devices. In this paper low power dissipation is achieved by using clock gating technique. It reduces the dynamic power dissipation by controlling the clock whenever it is not in use. Merge and Split clock gated concepts were applied in our design to find the low power dissipation. Experimental results show that our design achieves low power dissipation.

Keywords - Clock gating, Merge, Split, Switching power, cadence RTL tool.

# I. INTRODUCTION

In today's power hungry world energy dissipation is an important factor to be considered among other factors like area, speed and performance. The advancement of technology has resulted in development of portable devices which are becoming smaller in size and even more complex in design. The size reduction and complexity of portable devices have resulted in large amount of power consumption and consequently power dissipation in the devices. As a result low power designs have become inevitable part of today's devices. The main effect of power dissipation is the heat dissipated by the device. The increase in temperature results in decrease in life time of the transistors. This affects the reliability of the devices.

There are many low power techniques in use at present. They are clock gating, multi supply voltage, power gating, multi threshold voltage transistor, gate sizing etc. These techniques have their advantages and disadvantages according to the scenario in which they are used. These techniques are employed at different levels between RTL to GDSII flow. The tools are equipped with low power features and supporting libraries to help the designer achieve power budget.

In the beginning there was a focus on skew reduction which resulted in low skew and zero skew algorithms. But due to the small size and complex design power dissipation has gradually gained importance [1]. So they started planning for the clock gating design which was done after placement, but the power planning capability was not fully utilized [2]. The partition based cell placement proposed the insertion of clockgates based on power estimation but the clock gate enable timing probability was not considered in designs . Clock gate splitting capability was explored but merging was not successfully accomplished for instance which gated different clock domains [3]. This paper presents merge and split technique of clock gating elements to reduce power dissipation based on the synthesized clock tree.

# **II. POWER DISSIPATION IN CMOS:**

There are two types of power dissipation in a CMOS device dynamic and static power dissipation. Dynamic power dissipation is due to switching activity and static power dissipation is due to leakage power of a transistor. Static power dissipation normally occurs while the program is being run using the cadence tool hence where the dynamic power is the one which affects the system when it is implemented into an field programmable gate array or an harware module[4].Dynamic power is the combination of the switching activity of the transistor and short circuit current when the two transistors are turned on simultaneously.



Figure-1 CMOS power dissipation model

Total power = dynamic power + static power

Dynamic power = switching power + short circuit power.

Static power = leakage power

Hence the equation can be rewritten as

Total power = dynamic power + leakage power

Switching power  $= \mathbf{C} * \mathbf{V}^2 * \mathbf{F} * \mathbf{AF}$ 

Short circuit current = **Isc\*V\*F** 

#### Leakage power = **F**(**Vdd\*Vth\*(w/l**))

- C Capacitance.
- V Voltage.
- F Frequency.
- AF net switching activity factor.
- Isc short circuit current.
- Vdd- supply voltage.
- Vth threshold voltage.
- W width.
- L length.

#### **III. CLOCK GATING TECHNIQUE:**

Portable devices like mobiles, iPods, and laptops consume more energy which can exhaust battery charge within a short duration. Most of the power dissipation is of the dynamic type which necessities the reduction in switching power dissipation. In portable devices there is an opportunity for switching of a part of circuit that is not in use for a certain time. This results in the reduction in dynamic power dissipation of the device there by reducing overall power.

Clock gating is the technique in which part of the design can be gated, that is registers that don't change their state are not given clock signals. By this technique the power consumption in storing the same bit to memory of the flip-flop is reduced. There are different types in which the clock gating can be applied to a design. System level, combinational clock gating and sequential clock gating.



#### Figure-2 Clock gate with latch

In system level clock gating a module in a design can be gated when not in use. When a mobile is left idle the mobile switches off the display and some other features that are not used always results in considerable reduction in power. Sequential clock gating switches off the clock, applied to the flip-flops in a pipelined design are not in use during that stage. The sequential clock gating is difficult to achieve and tools are not provided with the capability to implement this feature hence we have to predict and verify the result which is very difficult to achieve.

RTL clock gating is a technique in which the architecture is analyzed for certain condition if the registers satisfy the condition then the registers can be clock gated. The gating can be done in the architecture during code insertion or clock gating components are inserted during the synthesis of the design.

The conditions for clock gating a design is: It should have a feedback system for registers, the enable signal activation logic of the mux should be determined, logic conditions that provide the output should be known. The gating elements are inserted according to the conditions where insertion of clock gating elements is possible and results in considerable power reduction.

# **IV. CLOCK TREE SYNTHESIS:**





The clock signal starts from the PLL and reaches the flipflops and many sequential elements. The path that the clock signal travels to reach these flip-flops is called clock tree[7]. The clock tree starts from the PLL source and branches to a number of sequential elements. The clock signals that originate from PLL do not reach the flip-flops at the same time. Figure 3 shows the clock tree structure. Clock tree synthesis is the process by which the clock signals are buffered to all the sequential elements such that they reach at almost the same time without any slack.

The clock tree synthesis report gives the clock tree structure and phase delay for different flip-flops in a design. By inserting the clock gating elements in a design we can compare the timing and power with ungated design.





#### V. CLOCK GATE SPLIT AND MERGE:

The clock tree structure obtained along with the clock synthesis report gives vital information about the skew and slack in the clock distribution network. There is a problem of high fan outs and clock gating of flip-flop by individuals gating instances which results in slack and skew problem. By using the splitting and merging techniques appropriately the power dissipation can be reduced. Split is the process by which the clock gating instances are divided among a fixed number of registers. Merge is the process by which flipflops in the gates at the same time are merged under a common clock gating instance.

# VI. I2C CONTROLLER:

I2C (inter-integrated circuit) controller is an interface device used for communication between IC's in a SOC design. There are different interface devices in use today, But I2C in figure 4 uses two wires for communicating between large no of IC's which makes it effective for communication with need for less amount of wires for communication[5]. It uses a two wire system Sda and Scl communication. Sda refers to serial data line. Scl refers to serial clock line.



I2C controller bus is connected to many devices which can serve as both master and slave. There can be only one master but there can be many slaves connected at a given time. I2C controller has many application and used in most of the interfaces in a chip. I2C devices work by controlling the clock line, the device which controls the bus (scl) as master and others act as slaves. Since most of the devices make use of the I2c controller we can try to reduce the power by using clock gate insertion.

#### VII. CLOCK GATE IMPLEMENATION:

The design is implemented in RTL COMPILER and the low power capability of the tool is invoked. The code is generated in verilog language and the input is given to RTL COMPILER by using tcl scripting. The technology files are linked with the tool which can use the standard cell library to calculate the area, power and delay of the design. In this paper ami 350nm [6]Technology standard cell library is used.



Figure -5 Netlist for I2C controller (split)

The elaboration of the design is done from the top model to the sub model used in the design. The sdc constraints are given directly through the tcl script and the design is elaborated and synthesized. Both generic and technology mapping is done and the results are obtained. After performing the synthesis, power, area and timing data is obtained. The split and merge of clock gating cells are performed through the rc- script window. The technology specific netlist and sdc files are obtained. Figure 5 and figure 6 shows the netlist schematic for I2C controller.

These files are given as input to soc encounter for obtaining the clock tree structure and timing information's. The iopads are added to the design along with technology and timing files. Then floor plan the design and place the components on the core.





Figure -6 Netlist for I2C controller (merge)

Specify clock tree inputs such as clock buffers and inverters to be used for generating the clock tree structure. The clock tree synthesis report gives the information about the number of flip-flops, clock buffers, sub-tree, skew and slack information's.

# **VIII. RESULTS:**

In this paper the results are shown in tabular column for ungated, gated, split and merge designs. The first tabular column gives the result for gated and ungated design shows a decrease in power, area and cells but increase in slack and skew. The second tabular column gives the result for split and merges which shows a increase in skew but decrease in slack. Figure 7 & 8 shows clock tree structure.



Figure -7 Clock tree structure for gated I2C (split)





#### Figure -8 Clock tree structure for gated I2C (merge)

|                  | Without-<br>clockgating | With-clockgating      |
|------------------|-------------------------|-----------------------|
| Area             | 165084                  | 144180                |
| Cells            | 938                     | 691                   |
| Dynamic<br>power | 124677271.35<br>0(nW)   | 109151600.949<br>(nW) |
| Total power      | 124677325.04<br>7(nW)   | 109151648.992(n<br>W) |
| Timing<br>slack  | -890ps                  | -896ps                |
| Rise skew        | 28.6(ps)                | 185.2(ps)             |
| Fall skew        | 28.6(ps)                | 150.7(ps)             |

 TABLE 1

 Result for gated and ungated design

TABLE 2Result for split and merge design

|                  | Clockgating-<br>split | Clockgating-<br>merge |
|------------------|-----------------------|-----------------------|
| Area             | 150016                | 145976                |
| Cells            | 771                   | 720                   |
| Dynamic<br>power | 132293051.98          | 127975159.581         |
| <b>L</b>         | (nW)                  | (nW)                  |
| Total power      | 132293102.67          | 127975209.166         |
|                  | (nW)                  | (nW)                  |
| Timing slack     | -917ps                | -833ps                |
| Rise skew        | 143.4(ps)             | 196.7(ps)             |
| Fall skew        | 155.8(ps)             | 247.9(ps)             |

# IX. CONCLUSION:

This paper demonstrates a split and merge of clock tree elements for a gated design. At first clock gating is applied to a design and then split and merge is done to the design to get the optimized results. The result shows a decrease in skew and slack but with an increase in power. The increase in power is due to the additional clock gating elements. They will be reduced when clock gating is applied to the design. The dynamic power obtained is overall power and not the gated power that is needed. This method is best suited for the design with millions of transistors. This method results in decrease of area, power and cells.

# **X. REFERENCES:**

- [1]. R.-S Tsay, "An Exact Zero Skew Clock Routing Algorithm" IEEETransaction on CAD/ICAS, Vol. 12, No. 2 pp 242-249.
- [2]. Monica Donno, Enrico Macii, Luca Mazzoni "Poweraware clock treeplanning". Proc of the 2004 international symposium on PhysicalDesign. 2004.
- [3]. Qi Wang & Sumit Roy "Power Minimization by Clock Root Gating" Design Automation Conference, 2003. Proceedings of the ASP-DAC 2003. Asia and South Pacific, 21-24 Jan. 2003 Page(s):249 – 254.
- [4]. Lower Supply Voltages Enable Low-Power Portable Electronic Devices. www.lowpowerdesign.com/ article\_microchip\_rao.htm.
- [5]. Temperature-Aware Modeling and Banking of IC Lifetime Reliability. Zhijian Lu, John Lach, Dept. of Electrical and Computer Engineering, University of Virginia
- [6]. ASIC-SOC-VLSI design asicsoc.blogspot.com/2008/04/ low-power-designtechniques
- [7]. Auzro powercentric- clock tree synthesis for clock gated design. http://www.azuro.com/powercentric/

# XI. ACKNOWLEDGEMENT:

# AUTHOR



**K. Hariharan** is currently working as Assistant Professor in Electronics and communication Engineering Department, sastra university, thanjavur, India.

He received his B.E. degree in Electronics and communication Engineering Department, sastra university, SRC Campus,

kumbakonam, India and the M.Tech. Degree in VLSI Design from sastra university, thanjavur, India. His current research interests include algorithms and VLSI architectural strategies for high speed dsp applications

#### **CO-AUTHOR:**

**C. Jayakumar** is currently pursuing the M.Tech in VLSI design from School of computing department in SASTRA University Thanjavur,, India. He received his B.E degree in Electronics and communication Engineering in 2010 from Adhiparasakthi engineering college, Anna University, India. His current research interests include low power multipliers and Currently he is involving in R&D activities in VLSI domain. His interests include VLSI backend design and MEMS based medical electronic ASIC designs.