# Analytical Queue Modeling for Network-on-Chip Router

A. Lit, H. Mohamed Basri, M. F. M. Sabri, N. Zamhari, K. Lias, M. H. Husin and A. S. W. Marzuki

Abstract—Routers are important modules in any Networkson-Chip (NoC)-based design. In order to achieve an satisfactory performance, routers must be designed to match network intermodule traffic. One of the most important methods to accomplish this matching is to improve the throughput and minimize the packet loss and router delay. An early approximation of the router delay is essentially required to aid designers to determine the system timing constrains at the higher levels of abstraction. This paper presents an analytical queue model for NoC routers. Furthermore, it explains how this model can be employed to study the consequence of changing the output traffic and queue size on the router in term of throughput, efficiency, packet loss probability and waiting time. The proposed model implemented a simple M/M/1/B markov chain as queuing model.

*Keywords*—Queue Modeling, Routers, Network-on-Chip, Markov chain

## I. INTRODUCTION

The rapid growing complexity of System-on-Chip (SoC) designs is the main motivation for both academician and industrial practitioner to come up with better solutions for the interconnection problem of internal system modules [3]. With SoC designs that have hundreds of computational units, interconnection utilising a single shared bus is no longer practical. An NoC is a fresh emerging paradigm that renders an integrated solution for accomplishing an effective inter-module communication [4].

Routers are vital modules in NoC-designs. Hence, switching methodologies and routing techniques are the focus of many researchers [5], [6]. Although a few router designs have been proposed [7]-[10] these designs always assume that only one of the clock edges is used to control data flow inside the router, which results in a mismatch between the router modeling and some implementation techniques that use both clock edges for data processing. Meanwhile in [2] Haytham et al. proposed a delay model for NoC output queuing router which the analysis was restricted with the latency only. This paper treat this problem by suggesting an analytical queue model router that considers numerous performance metrics which are throughput, efficiency, loss and waiting time. By applying this model, the affect of varying the output traffic and the queue size is analysed. This method is implemented to the design of an input queuing router as a case study. However, the same approach can be used to all other types of routers.

The rest of this paper is organized as follows. Section II presents related work in NoC-based router delay modeling. Section III gives an overview of the router structure. Section IV presents the modeling NoC router. Section V shows the queue performance. Section VI discusses the experimental results. Finally, Section VII presents our conclusions and future work.

# II. RELATED WORKS

Researchers addressed the NoC router delay model from various views [11]. Chien et al. suggested a delay model for virtual channel and wormhole routers [12]. Nevertheless, this model was designed for 0.8 - micron CMOS which cannot be implemented for pipelined architectures. Meanwhile, Lopez et al. proposed an extension to Chien's model for pipelined routers [13]. But Lopez's model assumes that the time duration of the clock cycle counts on the router latency, which is impractical assumption. Moreover, Peh et al. proposed a delay model for a variable pipelined wormhole router with fixed time cycle to address the Lopez's model problem [14]. However, these models cannot be applied for router designs that use both clock edges and furthermore they did not study the impact of changing router design parameters on its delay. Hence, the model proposed in this paper is developed to address this problem.

# **III. ROUTER STRUCTURE**

Similar to the computer network, router also operates at the network layer. A router uses packet headers and a forwarding table to determine the best way a packet should route among the networks. An NoC router has three main architectural components, they are input/output(IO) ports, queues, and switch fabric (SF). The SF establishes the required paths between pairs of input and output ports according to a certain routing mechanism such as round-robin scheduler, weighted round-robin scheduler, and max-min fairness scheduling [1].

Fig. 1 shows an input-queuing router. As can be seen, each input port has a dedicated first-in first-out (FIFO) queue for storing incoming packets. In one time step, an input queue must be able to support one write and one read operations. The merit behind of an input queuing router is the low memory speed requirement, distributed traffic management at each input port, and also distributed table lookup at each input port. In addition, it supports packets broadcast and multicast without the need to duplicate the packet. On the contrary, the main drawback is the head of line (HoL) problem when the

Asrani Lit is with the Faculty of Engineering, Universiti Malaysia Sarawak, 94300 Kota Samarahan, Sarawak, Malaysia (e-mail: lasrani@feng.unimas.my). Other authors are with the Faculty of Engineering, Universiti Malaysia Sarawak, 94300 Kota Samarahan, Sarawak, Malaysia



Fig. 1. Input-queuing router

packet at the head of the queue is blocked from accessing the desired output port [1]. There are three potential causes for packet loss, fully populated input queue, internal blocking due to blocked the SF, and when SF is busy serving another packet.

# IV. Modeling NoC router as an M/M/1/B

This section addresses an analytical framework for inputqueuing router. Each queue is considered as a *FIFO* queue. The model has simple close-form calculations and produces the performance of the queue.

## A. Notation

There are a number of notations involved in this model as listed in Table I.

TABLE I The Parameter Notation

| Notation   | Description                                     |
|------------|-------------------------------------------------|
| a          | Probability of packet arrival                   |
| b          | Probability of a packet not arrive, $b = 1 - a$ |
| c          | Probability that a packet leaves the queue      |
| d          | Probability that a packet does not leaves the   |
|            | queue, $d = 1 - c$                              |
| i          | <i>i</i> packets in the queue                   |
| $s_i$      | The queue is in state $s_i$                     |
| ρ          | Distribution index                              |
| $N_a(out)$ | Average output traffic                          |
| $N_a(in)$  | Average input traffic                           |
| B          | Size of buffer B                                |
| Th         | Throughput                                      |
| $\eta$     | Efficiency                                      |
| T          | Time step value                                 |
| L          | Packet Loss probability                         |
| Qa         | Average Queue Size                              |
| W          | Wait Time                                       |

# B. M/M/1/B Router Model

A simple M/M/1/B queue is used in this model. This model provides a discrete time Markov chain [1] analysis of queue where the time step is taken equal to the time required to transmit a packet. Poisson distribution traffic arrival process and the exponential distributed service time is assumed. The '1' means there are one server and 'B' means the queue has a finite buffer of size B.

Fig. 2 shows, the state transition diagram for the discretetime Markov chain M/M/1/B queue. A homogeneous Markov Chain is considered since packet arrivals and departures are independent of the time index value [1]. When a packet arrives, it is assumed that it could be serviced at the same time step with probability c.



Fig. 2. State Transition Diagram for discrete-time M/M/1/B

The probability state  $s_i$  that the queue has i packets is given by:

$$s_i = \rho^i s_0 \qquad 0 < i < B \tag{1}$$

where  $\rho$  is the distribution index:

$$\rho = \frac{ad}{bc} \tag{2}$$

The probability that the queue is empty:

$$s_o = \frac{1 - \rho}{1 - \rho^{B+1}}$$
(3)

The equilibrium distribution for the other states is given from equation 1 by:

$$s_i = \frac{(1-\rho)\rho^i}{1-\rho^{B+1}} \qquad 0 < i < B \tag{4}$$

# V. M/M/1/B Queue Performance

The queue performance is analyzed in term of throughput, average queue size, efficiency as well as packet delay. The output traffic or average of packets leaving the queue per time step is given by:

$$N_a(out) = acs_o + \sum_{i=1}^{\infty} cs_i \tag{5}$$

Throughput demonstrate how many end-to-end of packet transfer. The throughput of the queue in units of packets per time step is given by:

$$Th = N_a(out)$$
  
=  $acs_o + \sum_{i=1}^{\infty} cs_i$   
=  $acs_o + c(1 - s_o)$   
=  $c(1 - bs_o)$  (6)

The throughput is measured in units of packets/s is:

$$Th' = \frac{Th}{T} \tag{7}$$

The input traffic is measured in units of packets per time step is given by:

$$N_a(in) = (1 \times a) + (0 \times b) = a$$
 (8)

The efficiency is defined as:

$$\eta = \frac{N_a(out)}{N_a(in)}$$
$$= \frac{Th}{a}$$
$$= \frac{c(1-bs_o)}{a}$$
(9)

Data is lost when it is full and packets arrive but can not leave. The average lost traffic is measured in units of packets per time step and is given by:

$$N_a(lost) = s_B ad \tag{10}$$

The packet loss probability, L is the ratio of lost traffic relative to the input traffic:

$$L = \frac{N_a(lost)}{N_a(in)} = s_B d \tag{11}$$

The average queue size shows queue occupancy and given by:

$$Q_a = \sum_{i=0}^{B} is_i \tag{12}$$

Queue size is measured in units of packets. Using equation (4), the average queue size,  $Q_a$ :

$$Q_a = \frac{\rho \times [1 - (B+1)\rho^B + B\rho^{B+1}]}{(1-\rho) \times (1-\rho^{B+1})}$$
(13)

Latency or waiting time is define as the time to transfer a packet on the channel. The waiting time is measured in units of packets per time step:

$$W = \frac{Q_a}{Th} \tag{14}$$

# VI. EXPERIMENTAL RESULTS

Equations of M/M/1/B queue performance is used to study the relation between the packet loss probability, delay, queue size, throughput, and efficiency based on changing the output traffic and queue size. The importance of these close form equations for delay, loss, and throughput could be further used to model the whole router.

Fig. 3 shows throughput, efficiency, loss probability, and



Fig. 3. Throughput, efficiency, loss probability and delay versus input traffic when changing output traffic,  $\!c$ 

delay versus input traffic when changing output traffic. The values of the output traffic is in the range of 0 to 1. The input traffic means that packet arrival probability for the queue router and output traffic is probability that a packet leaves the queue. The throughput increases smoothly until as soon as input traffic reaches the value of output traffic,  $c \in \{0.2, 0.5, 0.7\}$ . The throughput of the queue could not exceed the maximum value for the average output traffic, c. Meanwhile, the efficiency of the queue is close to 100% at starting point and reduces when the input traffic approaches the maximum  $c \in \{0.2, 0.5, 0.7\}$ .

In the Fig. 3 presents the packet loss probability is always present but starts to increase when the input traffic approach the packet maximum output traffic, c. The output traffic, c =0.7 has the lowest packet loss probability compared to c = 0.2and c = 0.5. The more probability that a packet leaves the queue, the less packet loss occurred. This illustrates that packet delay increases significantly when the input traffic approaches the maximum packet service probability. At a given time step at most one packet arrives and at most one packet departs. The packet delay increases two times from 10 to 20 time steps for c = 0.7 and c = 0.5 representatively. Meanwhile for c = 0.5and c = 0.2, the packet delay raises more than twice to almost 50 time steps. The more probability that a packet leaves the queue, the less delay occurred. Congestion conditions occur as soon as the input traffic exceeds the maximum output traffic, c. Beside, the higher of the output traffic, the lesser the packet loss, and delay occurred.

Fig. 4 shows throughput, efficiency, loss probability and

delay versus input traffic when changing the queue size for B = 10, 20, 50. As shown, the change of the queue size does not have obvious affect for the throughput and efficiency. However, the queue size significantly affects the packet loss probability and delay. Fig. 4 presents that increase in B results in the decreases in loss probability. It can be seen the queue size from 10 to 50 improves the packet loss probability for small values of a. However, the packet delay increases with the increase of B. Hence, at 0.6 packet arrival rate with the change of the queue size from 10 to 50 increases the delay by 5 times. The increase in queue size, decreases packet loss probability while increases the delay. Therefore, the queue size, B is proportional to delay. The delay reaches a maximum value by the maximum B and the maximum c [1]. Furthermore, the length of queue size also can influence the area of resources in NoC.



Fig. 4. Throughput, efficiency, loss probability and delay versus input traffic when changing queue size, B

## VII. CONCLUSION

In this paper, an analytical model of router queue for NoC using Markov Chain analysis is presented. Close form simulation results showed that changing the queue size improved the packet loss but increased the delay. Moreover, the congestion conditions occured as soon as the input traffic exceeds the maximum output traffic. Obtaining an estimation of the queue performance is helpful for designer to identify the significant parameters at the early design phase for modeling NoC. Experimental results showed how to use the proposed model to obtain an early estimation of routing throughput, efficiency, waiting time and loss. This research can be advanced in several directions. First, the impact of using different service rates on the routing delay and throughput can be investigated. Another extension is to study the impact of using various routing protocols on the router.

## VIII. ACKNOWLEDGEMENT

The authors would like to express their sincere gratitude to Research and Innovation Management Centre (RIMC), Universiti Malaysia Sarawak (UNIMAS) for the financial support.

#### REFERENCES

- Fayez Gebali, "Computer Communication Networks Analysis and Design", Northar digital design, Victoria. 2004.
- [2] H. Elmiligi, M. Watheq El-Karashi and F. Gebali, "A Delay Model for Networks-on-Chip Output-Queuing Router", The 6th International Workshop on System on Chip for Real Time Applications, October 2006.
- [3] P. Pande, C. Grecu, A. Ivanov, R. Saleh, and G. De Micheli, "Design, synthesis, and test of networks on chips," IEEE Design and Test of Computers, vol. 22, no. 5, pp. 404-413, September 2005.
- [4] L. Benini and G. D. Micheli, "Networks on chips: A new SoC paradigm," IEEE Computer, vol. 35, no. 1, pp. 70-78, January 2002.
- [5] N. Kapre, N. Mehta, M. deLorimier, R. Rubin, H. Bamor, M. J. Wilson, M. Wrighton, and A. DeHon, "Packet switched vs. time multiplexed FPGA overlay networks," In Proceedings of the IEEE Symposium on Field-Programmable Custom Computing Machines, pp. 205-216, Los Alamitos, CA, April 2006.
- [6] C. Hilton and B. Nelson, "PNoC: A flexible circuit-switched NoC for FPGA-based systems," in IEE Proceedings of Computers and Digital Techniques, vol. 153, no. 3, pp. 181-188, May 2006.
- [7] J. Kim, D. Park, T. Theocharides, N. Vijaykrishnan, and C. R. Das, "A low latency router supporting adaptivity for on-chip interconnects," Proceedings of tlhe 42nd Annual Conference on Design Auitomation, pp. 559-564, Califonia, June 2005.
- [8] E. Rijpkema, K. Goossens, and P. Wielage, "A router architecture for networks on silicon," Proceedings of 2nd Workshop on Embedded Systems, Veldhoven, the Netherlands, October 2001.
- [9] R. Mullins, A. West, and S. Moore, "Low-latency virtual-channel routers for on-chip networks," Proceedings of 31st Antnual Interniational Symposium on Computer Architecture, pp. 188-197, Munich, Gemamy, June 2004.
- [10] C. Zeferino, M. Kreutz, and A. Susin, "RASoC: A router soft-core for networks-on-chip," In Proceedings of Design, Automation anid Test in Europe Conference and Exhibition (DATE), vol. 3, pp. 198-203, Paris, February 2004.
- [11] T. Bjerregaard and K. mahadevan, "A survey of research and practices of network-on-chip," ACM Computing Surveys, vol. 38, pp. 1-51, March 2006.
- [12] A. Chien, "A cost and speed model for k-ary n-cube wormhole routers," IEEE Transactions on Parallel and Distributed Systems, vol. 9, no. 2, pp. 29-36, February 1998.
- [13] J. Duato and P. Lopez, "Pefoniiance evaluation of adaptive routinlg algorithms for k-ary n-cubes," Proceedings of Parallel computer Routing and Communicationi Workshop, pp. 45-59, Seattle, WA, November 1994.
- [14] L.-S. Peh and W. J. Dally, "A delay model for router microarchitectures," IEEE Micro, vol. 21, no. 1, pp. 26-34, January 2001.