# Low Power Bus Coding Techniques Considering Inter-wire Capacitances

Paul P. Sotiriadis Department of EECS Massachusetts Institute of Technology Cambridge, MA 02139 pps@mit.edu

## **1. ABSTRACT**

ξ,

The power dissipation associated with driving data busses can be significant, especially considering the increasing component of inter-wire capacitance. Previous work on bus encoding has focused on minimizing transitions to reduce power dissipation. In this paper, it is shown that transition reduction is not necessarily the best approach for reducing power when the effects of inter-wire capacitance are considered. An electrical model for data busses designed with submicron technologies is presented and a family of coding techniques is proposed that can reduce the average power consumption of the bus by 40%.

#### 2. INTRODUCTION

An important component of the power consumption in digital processors involves the transmission of data through high capacitance busses. Several techniques have been proposed to reduce the power dissipation in these busses through the use of coding techniques, low-swing signaling, and charge recycling [1] [2] [3]. The use of reduced swing communication is attractive, but also suffers from reduced signal to noise ratio, especially as the power supply voltages are scaled [4][5][6][7]. An excellent reference that compares various low-swing techniques is presented in [1].

The coding of data has been explored for reducing power dissipation through the reduction of transitions. The basic idea is to add redundancy to encode the data on the bus. One effective approach proposed is the bus-invert technique in which the data bus is conditionally inverted to reduce the overall transitions [8]. If more than 50% of the bits change, the whole bus is inverted. Therefore, in addition to the data, an extra bit must be transmitted to indicate if the bus is inverted. In [9], temporal data correlations are exploited to reduce switching activity. Fundamental bounds on transition reduction are developed.

## **3. BUS AND DRIVER MODEL**

The previous work on data bus coding use a simple electrical model in which lines are simply replaced by lumped grounded capacitors. Although this model is convenient for closed form mathematical analysis of the power consumption, it is not appropriate for submicron technologies. This is because the smaller scales make the distributive nature of the lines non negligible and even more because lines are placed geometrically closer to each other, so the inter-wire capacitances  $C_I$  become important with

respect to the substrate capacitances  $C_L$  (see Figure 1).

In Figure 1, the lines are shown along with their dominant parasitic elements and their voltages  $V_1, V_2, ..., V_n$ . The  $C_L$  is the parasitic capacitance to substrate or other near by elements with constant potentials and the  $C_I$  is the parasitic capacitance between

Anantha Chandrakasan Department of EECS Massachusetts Institute of Technology Cambridge, MA 02139 anantha@mtl.mit.edu

adjacent lines. The values of all  $C_L$ 's and  $C_I$ 's are assumed not to vary from node to node. For this model not only the transitions of the voltages  $V_1, V_2, ..., V_n$  are responsible for the power consumption but also the relative transitions of the voltages of adjacent lines. This implies that the transition activity by itself is no longer the appropriate "measure" of power consumption. This complicates the exact calculation of the power consumption (making the standard assumption that the data sequences are white noise of 0's and 1's). In future work, we will extend this model to include the distributed nature of the wire.



Figure 1: Simple model for the data bus.

In order to calculate the energy consumption caused by a transition using the above model, we need a model for the drivers of the lines too. We adopt the following simple one.



Figure 2: Equivalent circuits for the line drivers.

Figure 2a corresponds to the output of the line driver when it applies a low signal to the line. The resistor R is the *on* resistance of the NMOS. Similarly for the Figure 2b, R is the *on* resistance of the PMOS. The two resistors are not assumed to be constant (similarly to the case described in [10] for a CMOS inverter driving a lumped capacitance).

#### 4. DERIVATION OF THE ENERGY FUNCTION

Combining the model of the data bus with the one of the drivers, we have the circuit in Figure 3. The buffers are replaced by resistors that are either connected to power supply  $(V_{dd})$  or GND.

22-6-1



Figure 3: Drivers and lines combined.

The voltages  $V_1^f, V_2^f, ..., V_n^f$  take the values 0 or  $V_{dd}$  depending on the *logical* values of the lines we want to establish. Setting

$$\lambda = \frac{C_I}{C_L} \tag{1}$$

we have the following equations describing the operation of the circuit. For k = 2, ..., n - 1.

$$C_L \cdot \left[ (1+\lambda) \cdot \frac{dV_1}{dt} - \lambda \cdot \frac{dV_2}{dt} \right] = \frac{V_1^f - V_1}{R_1(t)}$$
(2)

$$C_L \cdot \left[ -\lambda \cdot \frac{dV_{k-1}}{dt} + (1+2\lambda) \cdot \frac{dV_k}{dt} - \lambda \cdot \frac{dV_{k+1}}{dt} \right] = \frac{V_k^f - V_k}{R_k(t)}$$
(3)

$$C_L \cdot \left[ -\lambda \cdot \frac{dV_{n-1}}{dt} + (1+\lambda) \cdot \frac{dV_n}{dt} \right] = \frac{V_n^J - V_n}{R_n(t)}$$
(4)

Now let

$$V_1(0) = V_1^i, \dots, V_n(0) = V_n^i$$
 (5)

be the initial values of the voltages  $V_1, V_2, ..., V_n$ , and each of the  $V_1^i, V_2^i, ..., V_n^i$  is either 0 or  $V_{dd}$ . The final values of the voltages practically exist and are  $V_1^f, V_2^f, ..., V_n^f$ . Here we do the reasonable assumption that the clock period is long enough for the voltages to settle in their new values, so,

$$V_1(\infty) = V_1^f, \dots, V_n(\infty) = V_n^f$$
 (6)

The energy consumed (or deposited) by the driver k during the transition is:

$$E_k = \int_0^\infty V_k^f \cdot \left(\frac{V_k^f - V_k(t)}{R_k(t)}\right) dt = V_k^f \cdot \int_0^\infty \left(\frac{V_k^f - V_k(t)}{R_k(t)}\right) dt$$
(7)

Integrating Equations 2, 3, 4 from 0 to infinity and using Equations 5,6,7 we get,

$$E_1 = C_L \cdot [(1 + \lambda) \cdot (V_1^f - V_1^i) - \lambda \cdot (V_2^f - V_2^i)] \cdot V_1^f$$
(8)

$$E_{k} = C_{L} \cdot [-\lambda \cdot (V_{k-1}^{f} - V_{k-1}^{i}) + (1 + 2\lambda) \cdot (V_{k}^{f} - V_{k}^{i}) - \dots$$
$$\dots - \lambda \cdot (V_{k+1}^{f} - V_{k+1}^{i})] \cdot V_{k}^{f}$$
(9)

 $E_{n} = C_{L} \cdot [-\lambda \cdot (V_{n-1}^{f} - V_{n-1}^{i}) + (1+\lambda) \cdot (V_{n}^{f} - V_{n}^{i})] \cdot V_{n}^{f} (10)$ 

Hence the total energy consumed during the transition is given by the sum,

$$E = \sum_{r=1}^{n} E_r = \sum_{r=1}^{n} E_r^L + \lambda \sum_{j=1}^{n-1} E_j^J$$
(11)

where,

and

$$E_r^L = C_L \cdot V_r^f \cdot (V_r^f - V_r^i)$$
(12)

$$E_{j}^{I} = C_{L} \cdot [(V_{j+1}^{f} - V_{j}^{f}) \cdot (V_{j+1}^{f} - V_{j+1}^{i} + V_{j}^{i} - V_{j}^{f})]$$
(13)

$$= C_L \cdot [(V_{j+1}^f - V_j^f)^2 + (V_{j+1}^f - V_j^f) \cdot (V_j^i - V_{j+1}^i)]$$

It should be mentioned that if we normalize the energy function by setting  $C_L = 1$  and  $V_{dd} = 1$ , then the multiplications of the form  $V_x^y \cdot V_z^w$  for any x, y, z, w, are reduced to logical AND operations. The calculation of the energy *E* is then completed by counting *ones* and doing five additions and one multiplication (assuming  $\lambda$  is a convenient rational number could save a lot of complexity

#### 5. BUS CODING TECHNIQUES

in the multiplication).

For a data bus with *m*-lines one way of coding for low power is first to expand it by adding *a*-more lines and then modify the driving circuit of the bus so that only the words of a subset Z of the (total) set  $T = \{0, 1\}^{m+a}$  are transmitted. The subset Z must contain at least  $2^m$  words whose average energy cost of use is less than that of the original bus.

It is a principle of coding theory that the larger the data set you want to compress the higher the efficiency of the compression is. The same is true for coding techniques that achieve power reduction in data busses. Suppose that we could afford a large latency in the data transmission. Then we could store a long sequence of data, process it and then transmit it through the data bus. This in general would allow for very efficient power saving algorithms. Unfortunately a large latency is not usually acceptable. Therefore we propose a family of coding schemes with minimal latency. Let

$$D(k) = (d_1(k), d_2(k), \dots, d_m(k))$$
(14)

be the data vector that must be encoded and transmitted at time k and let

$$L(k) = (l_1(k), l_2(k), \dots, l_n(k))$$
(15)

be the vector containing the logical values of the bus lines at time k. We set n = m + a and decompose the vector L(k) into two parts. The *data* part,

$$L^{D}(k) = (l_{1}(k), l_{2}(k), ..., l_{m}(k))$$
(16)

and the code part,

$$L^{C}(k) = (l_{m+1}(k), l_{m+2}(k), ..., l_{n}(k)).$$
(17)

22-6-2

508





so,  $L(k) = (L^{D}(k), L^{C}(k))$ . In Figure 4, the encoder and decoder schemes are shown.

The function E is the energy function described above. The Eblocks in Figure 4 calculate the cost of transitions from the current state L(k-1) of the extended bus to its possible new states

$$L(k) = (D(k) \oplus P_r, J_r)$$
(18)

for,  $r = 0, 1, ..., 2^a - 1$ . The  $J_r$ 's are the expressions of the index r in binary form, i.e.

$$J_0 = (0, ...0), ..., J_{2^a - 1} = (1, ..., 1)$$
 (19)

Finally the *control* function C, is defined to give as output one of the  $J_r$ 's for which the transition from L(k-1) to L(k) has the *minimum* possible cost with respect to r.

## 5.1 A Special case

If the ratio  $\lambda$  is in the order of one or greater, an important part of the energy consumption *E*, is due to the interaction of adjacent lines. This motivates a choice of  $P_r$ ,  $r = 0, ..., 2^a - 1$  that "break" sequences of data patterns that would cause a lot of energy consumption. We give a class of such sets that has interesting energy saving results. In this case, the whole coding scheme can be thought as a generalization of the Bus-Invert technique [8].

We have chosen  $P_r$ ,  $r = 0, ..., 2^a - 1$  to be the  $2^a$  different vectors that can be written as linear combinations of the following *basis* vectors  $Y_1, Y_2, ..., Y_a$ . In other words,

 $P_{r} = (r_{1} \cdot Y_{1}) \oplus (r_{2} \cdot Y_{2}) \oplus \dots \oplus (r_{a} \cdot Y_{a})$ (20) where,  $r = (r_{1}, r_{2}, \dots, r_{a}), r_{i} \in \{0, 1\}.$ 

| Y <sub>1</sub> | 111111111111111111111111111111111111111 |
|----------------|-----------------------------------------|
| Y <sub>2</sub> | 010101010101010101010101010101          |
| Y <sub>3</sub> | 0011001100110011001100110011            |
| Y <sub>4</sub> | 0000111100001111000011110011            |
| Y <sub>5</sub> | 00000001111111000000001111              |
|                |                                         |
| Y <sub>a</sub> | 000000000000000000000000000000000000000 |

Table 1: Basis vectors for the encoder-decoder.

## 6. SIMULATIONS

The encoder - decoder scheme described in Section 5 was simulated using the class of sets  $P_r$ ,  $r \in \{0, 1\}^a$  defined above. A sequence of 5000, *m-bit* words was used in every simulation. The bits of the words were realizations of independent and uniformly distributed in  $\{0, 1\}$  random variables. Finally, simulations were run for all combinations of the parameters, m = 4, 8, 16, 32, 64, a = 1, 2, 3, 4 and  $\lambda = 0, 1, 2, \infty$ . The figures below show the percentage of energy savings when the coding technique is used, this is equal to,

$$100\left(1 - \frac{E_c}{E_0}\right)\%\tag{21}$$

Where  $E_c$  is the total energy consumed when the coding was used

and  $E_0$  is the total energy consumed without coding. The energies were calculated using Equations 11,12,13. In this initial research, the power consumption of the encoder and decoder circuits were ignored. This approximation remains valid for very high capacitance data busses and for small values of the parameter *a*, where the complexity of the circuits is low. The results presented here however, provide a bound on the energy savings.

#### 7. CONCLUSIONS

In this paper a new coding technique for low power has been presented. It is based on an model for data bus that explicitly considers the inter-wire capacitance. It was observed that the power dissipation is no longer minimized by simply minimizing the transition activity. Depending on the technology parameters (i.e., the relative size of inter-wire capacitance to the substrate capacitance), our proposed coding technique could save up to 40% of the power consumed by the drivers of the lines.





Figure 8: Energy savings with  $\lambda = \infty$ .

#### 8. ACKNOWLEDGEMENTS

This paper acknowledge support from the MARCO Focused Research Center on Interconnects which is funded at the Massachusetts Institute of Technology, through a subcontract from the Georgia Institute of Technology. Paul Sotiriadis is partially supported by the Alexander S. Onassis Public Benefit Foundation, the Greek Section of Scholarships and Research.

## 9. REFERENCES

- [1] A. Chandrakasan, R. Brodersen, *Low power CMOS Design*, IEEE Press, 1998.
- [2] K.Y. Khoo, A. Willson, Jr., "Charge recovery on a databus," IEEE/ACM International Symposium on Low Power Electronics and Design, pp. 185-189, 1995.
- [3] B. Bishop, M. J. Irwin, "Databus charge recovery: Practical consideration," *International Symposium on Low Power Electronics and Design*, pp. 85-87, August 1999.
- [4] Y. Nakagome, K. Itoh, M. Isoda, K. Takeuchi, M. Aoki, "Sub-1-V swing internal bus architecture for future low-power ULSI's," *IEEE Journal of Solid-State Circuits*, pp. 414-419, April 1993.
- [5] M. Hiraki, H. Kojima, H. Misawa, T. Akazawa, Y. Hatano, "Data-dependent logic swing internal bus architecture for ultralow-power LSI's," *IEEE Journal of Solid-State Circuits*, pp. 397-401, April. 1995.
- [6] H. Yamauchi, H. Akamatsu, T. Fujita, "An asymptotically zero power charge-recycling bus architecture for batteryoperated ultrahigh data rate ULSI's," *IEEE Journal of Solid-State Circuits*, pp. 423-431, April. 1995.
- [7] H. Zhang, J. Rabaey, "Low swing interconnect interface circuits," *IEEE/ACM International Symposium on Low Power Electronics and Design*, pp. 161-166, August 1998.
- [8] M. Stan, W. Burleson, "Low-power encodings for global communication in cmos VLSI," *IEEE Transactions on VLSI* Systems, pp. 49-58, Vol. 5, No. 4, Dec. 1997.
- [9] S. Ramprasad, N. Shanbhag, I. Hajj, "A coding framework for low-power address and data busses," *IEEE Transactions on VLSI Systems*, pp. 212-221, Vol. 7, No. 2, June 1999.
- [10] J.M. Rabaey, Digital Integrated circuits. Prentice Hall 1996.

22-6-4