# An Information Theory Approach to Power - Optimal Traffic Routing in Networks on Chips

Paul P. Sotiriadis Electrical and Computer Engineering Johns Hopkins University

Abstract—Power - optimal traffic routing in bus - networks is studied from an information theoretic perspective. Power consumption in bus - networks is mathematically related to traffic patterns leading to theoretically optimal routing strategies and ultimate performance limits. Networks-on-Chip depending strongly on efficient communication schemes motivate this work.

# I. INTRODUCTION

Network-on-Chip (NoC) is a new paradigm of data processing structure that is expected to lead the way in high performance digital design in the future. It arises naturally through the maturity of the Deep-Sub-Micron (DSM) technologies and the upcoming nano-technology devices [1]-[2].

As indicated by their name, NoCs are heavily interconnected structures with highly complicated and sophisticated data communication networks [2]-[3].

Power consumption in on-chip buses and the general global on-chip signaling network has been an important issue in the design and performance of advanced microprocessors during the last decade [4], and several power reduction techniques have been proposed [5]-[11]. Given the NoC design direction it is expected that bus-network power optimization will play a crucial role in the future as well.

This paper introduces an information theoretic framework for deriving the power-optimal performance of on-chip communication networks. It is based on the analytical derivations of the optimal power vs. rate performance of buses presented in [12] and [13]. Although the concepts in this paper are not tied to any particular bus technology, the general bus-energy in [14] has been used to generate concrete examples.

The discussion focuses on optimal routing of the communication traffic in bus-networks that minimizes power consumption while maintaining the desirable bit rates. The problem is approached from an abstract perspective.

#### II. FROM BUSES TO NETWORKS

To understand the relationship between information flow and power consumption in on-chip communication networks it is important that we start with an appropriate model of the same relationship in single buses. The exact design of the buses is not important as long as they have an energy cost function depending on single step transitions. The buses may have forward encoding schemes for error detection/correction as well.

#### III. BUSES: POWER VS. INFORMATION RATE MODEL

DSM technology buses suffer from major inter-line parasitics. A commonly used bus-energy model for energy calculation is shown in Figure 1 [14].



Fig. 1. Simplified energy-equivalent model for typical shielded DSM buses

Here,  $C_L$  is the parasitic capacitance between a line and ground (or surroundings) and  $C_I$  is the inter-line capacitance, practically limited between successive lines. It is convenient to write  $C_I = \lambda C_L$  for the appropriate value  $\lambda$ . Dynamic energy loss dominates in most buses and is given by Lemma 3.1.

Lemma 3.1: [13]-[14] The energy loss during the transition, of the *n*-line bus, from state  $X = (x_1, x_2, \dots, x_n)^T$  to state  $Y = (y_1, y_2, \dots, y_n)^T$ ,  $x, y \in \{0, 1\}^n$  is:

$$E_{DSM}(X,Y) = \frac{V_{cc}^2 C_L}{2} \left(Y - X\right)^T \mathcal{A} \left(Y - X\right)$$
(1)

where 
$$\mathcal{A} = \begin{bmatrix} 1+2\lambda & -\lambda & 0 & \cdots & 0 \\ -\lambda & 1+2\lambda & -\lambda & \cdots & 0 \\ 0 & -\lambda & 1+2\lambda & \cdots & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & \cdots & 1+2\lambda \end{bmatrix}$$
.

In general, let E be a bus-energy function and E(X, Y) be the energy cost of the transition from state X to state Y.

Definition 3.1: If the *n*-bit vectors X(k), k = 1, 2, ... transmitted through the *n*-line bus are random, i.e., if X is a stochastic sequence, the expected energy per clock cycle is <sup>1</sup>

$$\bar{E}(X) = \lim_{m \to \infty} \frac{1}{m} \sum_{k=1}^{m} \mathcal{E}\left\{E(X(k-1), X(k))\right\}$$
(2)

Energy consumption in the bus is associated with the transmitted information [9], [12], [13], [16]. Consider an *n*-line bus and a sequence of transmitted *n*-bit vectors, X(k), k =1, 2, ... Data (or address)<sup>2</sup> sequences are rarely temporally uncorrelated; that is, in almost all cases, vector X(k) has some statistical dependance to the previously transmitted vectors X(1), X(2), ..., X(k - 1). This statistical dependance corresponds to redundancy which, in principle, can be removed from the data sequence using coding, resulting in a smaller-in-size sequence carrying the same amount (or rate) of information [15].

<sup>1</sup>To avoid technicalities we assume that  $X = \{X(k)\}_k$  is stationary. <sup>2</sup>From the information theoretic perspective, data and addresses are both "information" data. The *entropy rate*,  $\mathcal{H}(X)$ , is an information theoretic metric of the "pure" amount of information carried by stochastic sequence X, per clock cycle [15]. It is

$$\mathcal{H}(X) = \lim_{m \to \infty} \frac{H_m(X(1), X(2), \dots, X(m))}{m}$$
(3)

where  $H_m(X(1), X(2), \ldots, X(m))$  is the entropy of the partial sequence  $X(1), X(2), \ldots, X(m)$ , that is:

$$H_m(X(1), X(2), \dots, X(m)) = -\sum_{X(1), X(2), \dots, X(m)}$$
  
Pr(X(1), X(2), \dots, X(m)) log<sub>2</sub> (Pr(X(1), X(2), \dots, X(m)))

In the sense of the Shannon-McMillan-Breiman theorem,  $\mathcal{H}(L)$  equals the expected number of bits needed to express the information content of the vector  $X(k)^3$  [15]. Moreover, under certain but general conditions<sup>3</sup>, two stochastic sequences having the same entropy rate can be considered equivalent to each other in the sense that in principle we can find coding schemes mapping one to the other [15]. Based on this we have the following definition.

Definition 3.2: The minimum average energy required per clock cycle to transmit  $m, 0 \le m \le n$ , information bits, on average, per clock cycle is

$$E^*(m) = \min_{X: \mathcal{H}(X)=m} \bar{E}(X) \tag{4}$$

The following theorem provides us with the lower limit of communication energy.

Theorem 3.1: [12]-[13] Consider an *n*-line bus with energy function<sup>4</sup> E. The minimum average energy, per clock cycle, for transmitting, on average, m,  $0 \le m \le n$ , information bits, per clock cycle, is  $E^*(m) = \frac{1}{\mu} g^T S g$  where  $\gamma$  is the positive solution of the equation

$$m = \frac{1}{\ln(2)} \left( \ln(\mu) + \frac{\gamma}{\mu} g^T S g \right), \tag{5}$$

 $\mu = \mu(\gamma)$  is the maximal eigenvalue of the matrix

$$W(\gamma) = \left[e^{-\gamma E(x,y)}\right]_{x,y=0}^{2^{n}-1},$$
(6)

g is the corresponding normalized eigenvector,  $\|g\|_2 = 1$ , and

$$S(\gamma) = \left[ E(x, y) e^{-\gamma E(x, y)} \right]_{x, y=0}^{2^n - 1}.$$
 (7)

*Example 3.1:* Consider a DSM bus with energy model like that in Figure 1. The transition energy cost is given by equation 1 and for simplicity we set  $V_{cc}^2 C_L = 1$  and n = 8. The minimum (normalized) energy per clock cycle as a function of the entropy rate is shown in Figure 2 for  $\lambda = 3, 5$  and 10.

The maximum (top right) of the curves equals the expected energy per cycle,  $E_u$ , when the transmitted sequence X is formed out of independent random variables uniformly distributed in  $\{0, 1\}^n$ . It is

$$E_u = \frac{n}{4} (1+2\lambda) V_{cc}^2 C_L$$
 (8)

<sup>3</sup>e.g. when X(k) is ergodic.

<sup>4</sup>For simplicity it is assumed that E is symmetric.



Fig. 2. Minimum energy per cycle for entropy rate m.

# IV. POWER VS. RATE APPROXIMATE MODEL

Function  $E^*$  provided by Theorem 3.1 can be approximated by (9) which is simpler and easier to handle analytically.

$$E^*(m) \cong \left(1 - \sqrt[3]{1 - m/n}\right) E_u \tag{9}$$

Figure 3 shows the exact and approximate values of  $E^*$  for the case  $\lambda = 5$  of example 3.1. It is  $\phi(m) = 1 - \sqrt[3]{1 - m/n}$ .



Fig. 3.  $E^*$  and its approximation for DSM bus with n = 8 and  $\lambda = 5$ .

Different buses may run at different clock frequencies, have different widths and perhaps different coding schemes. Instead of the entropy rate m and minimum energy per cycle  $E^*$  it is convenient to define and use the *rate* and *minimum power*.

Definition 4.1: The rate R of a bus is R = mf, where m is the average number of information bits transmitted per clock cycle and f is the frequency of the bus clock. The maximum rate is  $R_m = nf$  where n is the width of the bus. The minimum power consumption (MPC) of the bus is  $P = E^*(m)f$  where  $E^*(m)$  is the minimum energy per clock cycle at entropy rate m, given by Theorem 3.1.

Using the bus-energy model in Section III equations (8) and (9) result in  $P \cong \Theta(R, R_M)$  where

$$\Theta(R, R_M) = \beta R_M \left( 1 - \sqrt[3]{1 - R/R_M} \right)$$
(10)

and factor  $\beta$  is the same for all buses. Without loss of generality we set  $\beta = 1$ .  $R_M$  is the maximum rate of the bus.

394

## V. NETWORKS: TRAFFIC ROUTING POWER OPTIMIZATION

The information flow through buses is assumed optimal, in the sense that rate (entropy rate) and power (energy per cycle) are related as in Theorem 3.1 and are approximated by (10).

*Case 5.1:* Consider the simple case in Figure 4 where node 1 transmits data to node 2 at a rate R. Two identical buses (thick gray strip with arrow indicating the direction of the bus) with maximum rate  $R_m$  are available and dedicated to this purpose.



Fig. 4. Two identical and dedicated buses from node 1 to node 2

The data traffic can be split into the two buses and the data can be recombined at the other end. The following constraints must be satisfied  $0 \le R \le 2R_M$ ,  $0 \le R_i \le R_M$  and  $R_1 + R_2 = R$ ; the total power is  $P_T = \Theta(R_1, R_M) + \Theta(R_2, R_M)$ .



Fig. 5. Total power consumption of the scheme in Figure 4

Figure 5 (dashed surface) shows the total power consumption of the configuration in Figure 4 as a function of the rates  $R_1$  and  $R_2$ . It has been set  $R_M = 1$  and all variables are considered dimensionless. The solid lines indicate the intersection of the surface with the plane  $R_1 + R_2 = R$  for different values of the total rate R. The dots correspond to the pair  $(R_1, R_2)$  leading to minimum power at a given rate  $R_1 + R_2 = R$ , i.e.

$$\min_{\substack{R_1 + R_2 = R\\ R_1, R_2 \ge 0}} \Theta(R_1, R_M) + \Theta(R_2, R_M)$$
(11)

Since the two buses are identical and the cost function  $\Theta(\cdot, R_M)$  is convex, the optimal solution is  $R_1 = R_2 = R/2$ .

*Case 5.2:* A variation of the previous case is shown in Figure 6. The buses are dedicated but have different rates,  $R_{M_1}$ ,  $R_{M_2}$ .

Figure 7 (dashed surface) shows the total power consumption as a function of the two rates  $R_1$  and  $R_2$  when  $R_{M_1} = 4$  and  $R_{M_2} = 1$ . Again, the solid lines indicate the intersection of the surface with the plane  $R_1 + R_2 = R$  for different values of the total rate R. The dots correspond to the pairs  $(R_1, R_2)$ solving the power minimization problem

$$\min_{\substack{R_1 + R_2 = R\\ 0 \le R_1 \le R_{M_1}\\ 0 \le R_2 \le R_{M_2}}} \Theta(R_1, R_{M_1}) + \Theta(R_2, R_{M_2}) \tag{12}$$



Fig. 6. Two different and dedicated buses from node 1 to node 2



Fig. 7. Total power consumption of the scheme in Figure 6 when  $R_{M_1}=4 \mbox{ and } R_{M_2}=1$ 

The Kuhn-Tucker conditions for (12) imply  $R_1/R_2 = R_{M_1}/R_{M_2}$  and the following solution for  $0 \le R \le R_{M_1} + R_{M_2}$ 

$$R_1 = \frac{R_{M_1}}{R_{M_1} + R_{M_2}} R , \quad R_2 = \frac{R_{M_2}}{R_{M_1} + R_{M_2}} R$$
(13)

*Case 5.3:* Suppose that the buses in the configuration of Figure 6 are shared as shown in Figure 8; signals of total rate  $R_3$  are also transmitted through bus 1 and signals of total rate  $R_4$  are transmitted through bus 2. These imply the constraints  $0 \le R_1 \le R_{M_1} - R_3$  and  $0 \le R_2 \le R_{M_2} - R_4$ .



Fig. 8. Two different and shared buses from node 1 to node 2

Without any transmission from node 1 to node 2 the power consumption is  $\Theta(R_3, R_{M_1}) + \Theta(R_4, R_{M_2})$  due to  $R_3$  and  $R_4$ . We want to minimize the additional power consumption,  $P_{ad}$ , due to R, where

$$P_{ad}(R_1, R_2) = \Theta(R_1 + R_3, R_{M_1}) - \Theta(R_3, R_{M_1}) + \Theta(R_2 + R_4, R_{M_2}) - \Theta(R_4, R_{M_2})$$
(14)

The corresponding optimization problem is

$$\min_{\substack{R_1 + R_2 = R\\ 0 \le R_1 \le R_{M_1} - R_3\\ 0 \le R_2 \le R_{M_2} - R_4}} P_{ad}(R_1, R_2)$$
(15)

The solution of the problem is:

395

• If  $\frac{R_3}{R_{M_1}} \ge \frac{R_4}{R_{M_2}}$ : then the solution of (15) is  $R_1 = 0$ and  $R_2 = R$  when  $0 \le R \le \frac{R_{M_2}}{R_{M_1}}R_3 - R_4$ , and, for  $R > \frac{R_{M_2}}{R_{M_1}}R_3 - R_4$  it is given by

$$R_1 = \frac{R_{M_1}R_4 - R_{M_2}R_3}{R_{M_1} + R_{M_2}} + \frac{R_{M_1}}{R_{M_1} + R_{M_2}}R \qquad (16)$$

$$R_2 = \frac{R_{M_2}R_3 - R_{M_1}R_4}{R_{M_1} + R_{M_2}} + \frac{R_{M_2}}{R_{M_1} + R_{M_2}}R$$
(17)

• If  $\frac{R_3}{R_{M_1}} < \frac{R_4}{R_{M_2}}$ : then  $R_1 = R$  and  $R_2 = 0$  for  $0 \le R \le R_3 - \frac{R_{M_1}}{R_{M_2}}R_4$ , and, for  $R > R_3 - \frac{R_{M_1}}{R_{M_2}}R_4$  the solution is given by (16) and (17).



Fig. 9. Power consumption of the scheme in Figure 8 due to  $R_1$ ,  $R_2$ 

Figure 9 (dashed surface) shows the additional power consumption,  $P_{ad}$ , as a function of the two rates  $R_1$  and  $R_2$  when  $R_{M_1} = 4$ ,  $R_{M_2} = 2$ ,  $R_3 = 1.5$  and  $R_4 = 1.5$ . The solid lines indicate the intersection of the surface with the plane  $R_1 + R_2 = R$  for different values of the rate R. The dots correspond to the solution of the minimization problem (15).

*Case 5.4:* Consider now the network in Figure 10. Node 1 transmits to node 2, at rate R, through the five buses which are also used for carrying other information streams as well. The total rates in the buses are  $R_1$  to  $R_5$  and their capacities are  $R_{M_1}$  to  $R_{M_5}$  respectively (the capacities are not shown).



Fig. 10. Network of shared buses supporting communication from node 1 to node 2

The network implies the following constraints:

$$\begin{array}{rclcrcr} 0 &\leq & R_k &\leq R_{M_k} \ , \ \ k=1,2,...,5 \\ R+U_1 &= & R_2+R_1 \\ U_2+R_1 &= & R_3+R_4 \\ U_3+R_2+R_3 &= & R_5 \end{array}$$

and the total power consumption is

$$P_T = \sum_{k=1}^{5} \Theta(R_k, R_{M_k})$$
(18)

The problem of minimization of  $P_T$ , given the constraints above, can be solved using the Kuhn-Tucker conditions.

For example, if  $R_{M_1} = 3$ ,  $R_{M_2} = 3$ ,  $R_{M_3} = 2$ ,  $R_{M_4} = 3$ ,  $R_{M_5} = 5$ ,  $U_1 = 2$ ,  $U_2 = 2$  and  $U_3 = 1$ , and the transmission rate is R = 2, then the minimum power consumption is achieved if we route the data (rates) as:  $R_1 = 1.58$ ,  $R_2 = 2.42$ ,  $R_3 = 0.85$ ,  $R_4 = 2.73$  and  $R_5 = 4.27$ . The corresponding minimum power is  $P_T = 6.29$ .

## VI. CONCLUSIONS

The power - optimal traffic routing in bus - networks has been approached from an information theoretic perspective.

Mathematical relations between power consumption and information traffic in bus - networks have been introduced and led to derivations of optimal performance bounds. Several examples have been used to illustrate the abstract concepts.

The work has been motivated by the major requirement for power efficient communication strategies in Networks-on-Chips.

## REFERENCES

- L. Benini, G. D. Micheli, "Networks on chip: a new paradigm for systems on chip design", *Proceedings Design, Automation and Test in Europe Conference and Exhibition*, pp. 418-419, 2002.
   L. Benini and G. DeMicheli, "Networks on Chips: A New Paradigm for
- [2] L. Benini and G. DeMicheli, "Networks on Chips: A New Paradigm for Component-Based MPSoC Design", *IEEE Computer*, Jan. 2002, pp. 70-78.
- [3] P. Wielage and K. Goossens, "Networks on Silicon: Blessing or Nightmare ?", In Proc. of Euromicro Symposium on Digital System Design Architectures, Methods and Tools, pp. 196-200, 2002.
- [4] C. Cheng, J. Lillis, S. Lin, N. Chang, Interconnect Analysis and Synthesis, John Wiley and Sons, 2000.
- [5] K.Y. Khoo, A. Willson, Jr., "Charge recovery on a databus", *IEEELACM International Symposium on Low Power Electronics and Design*, pp. 185-189, 1995.
- [6] M. Stan, W. Burleson, "Low-power encodings for global communication in cmos VLSI", *IEEE Transactions on VLSI Systems*, pp. 49-58, Vol. 5, No. 4, Dec. 1997.
- [7] L. Benini, G. De Micheli, E. Macii, D. Sciuto and C. Silvano, "Asymptotic zero-transition activity encoding for address busses in low-power microprocessor-based systems", *Proceedings of the Great Lakes Symposium*, *VLSI*, pp. 77-82, 1997.
- [8] H. Zhang, J. Rabaey, "Low swing interconnect interface circuits", *IEEE-ACM International Symposium on Low Power Electronics and Design*, pp. 161-166, Aug. 1998.
- [9] S. Ramprasad, N. Shanbhag, I. Hajj, "A coding framework for low-power address and data busses", *IEEE Transactions on VLSI Systems*, pp. 212-221, Vol. 7, No. 2, June 1999.
- [10] P. Sotiriadis, A. Chandrakasan, "Low power bus coding techniques considering inter-wire capacitances", *IEEE Custom Integrated Circuits Conference*, pp. 507-510, 2000.
- [11] P. Sotiriadis, A. Chandrakasan, "Bus Energy Reduction by Transition Pattern Coding Using a Detailed Deep Submicrometer Bus Model", *IEEE Transactions on Circuits and Systems-I*, Vol. 50, No. 10, Oct. 2003, pp. 1280-1295.
- [12] P. Sotiriadis, V. Tarokh, A. Chandrakasan, "Energy Reduction in VLSI Computation Modules: An Information - Theoretic Approach", *IEEE Transactions on Information Theory*, April 2003, Vol. 49, No. 4, pp. 790-808.
- [13] P. Sotiriadis, A. Chandrakasan, "Power Estimation in Deep Sub-Micron Buses, Statistical Measures and Energy Limits of Communication", *Journal* of Circuits, Systems, and Computers, Vol. 11, No. 6, Feb. 2003, pp. 637-658.
- [14] P. Sotiriadis, A. Chandrakasan, "A Bus Energy Model for Deep Sub-Micron Technology", *IEEE Transactions on VLSI Systems*, Vol. 10, No. 3, June 2002, pp. 341-350.
- [15] T. Cover, J. Thomas, *Elements of Information Theory*, John Wiley and Sons, 1991.
- [16] R. Marculescu, D. Marculescu, M. Pedram, "Probabilistic Modeling of Dependencies During Switching Activity Analysis", *IEEE Transactions Computer-Aided Design of Integrated Circuits and Systems*, vol. 17, pp. 73-83, June 1998.

396