Publication No 40158


Mutter, A.*


A Novel Hybrid Memory Architecture for High-Speed Packet Buffers in Network Nodes - Communication Networks and Computer Engineering Report No. 108


Broadband Networks


Systems Engineering




Routers are the prevalent type of network nodes in today?s Internet. A router processes incoming packets and forwards them towards their destination. Core routers, i. e., routers that operate in the core of the Internet, contain up to hundreds and more ports to be able to interconnect many network segments. Temporary unbalanced traffic between the ports of a router can lead to overload situations. To minimize packet loss routers contain packet buffers to hold packets during times of congestion. To be able to provide the large buffering capacities required packet buffers are typically implemented with DRAM (Dynamic Random Access Memory). One major problem of building high-speed packet buffers is that line rates and therewith the packet rates grow much faster than the random access time of DRAM decreases. The random access time of a memory bounds the rate of individual accesses to the memory device. At a line rate of 10Gbps DRAM random access time was just short enough to meet the required access time. Since then, the gap between these values steadily increases. For example, on a 100Gbps link an Ethernet frame can arrive every 6.7 ns but DRAM random access time is approx. 50 ns. Using a hybrid memory architecture can close this gap by combining the strengths of both major memory technologies: short random access time of SRAM (Static Random Access Memory) and large capacity of DRAM. However, the architecture proposals in literature that provide a deterministic bandwidth suffer from high memory resource requirements and inefficient memory resource utilization. The main reasons for this are fragmentation, inefficient DRAM data bus utilization, and large required SRAM capacities. These properties limit scalability and increase costs and power consumption. This thesis proposes a novel hybrid memory architecture for high-speed packet buffers that delivers deterministic bandwidth. The novelty of the architecture is that it significantly reduces the memory resources compared to related architectures from literature, while it provides the same functionality. Memory resources refer to the required SRAM and DRAM capacity and bandwidth, as well as to the DRAM data bus pin count. The feasibility of the architecture in hardware at high line rates is shown by a prototypical packet buffer implementation. The thesis introduces at first fundamentals of packet buffering. It addresses the potential locations to place a packet buffer in a router, defines the term packet buffer, and introduces its basic building blocks. It also quantifies the requirements a packet buffer has to suffice and compares them to cutting-edge SRAM and DRAM devices. Then focus is set on hybrid memory architectures and the necessary metrics to evaluate these. Architecture proposals from literature are surveyed and their pros and cons are discussed. The main objective for the design of the novel hybrid memory architecture was to reduce memory resource requirements without reducing functionality. Besides providing a deterministic bandwidth the design targets were reduction of the SRAMcapacity and reduction of the DRAM resources compared to related architectures. The architecture features that are necessary to meet all targets simultaneously are derived. For example, packets are aggregated to blocks and only blocks are buffered in SRAM and DRAM. As aggregation eliminates fragmentation, this decreases the required bandwidth and capacity of SRAM and DRAM. To further significantly decrease the SRAM capacity the queues maintained in the SRAM share the SRAM dynamically. The architecture contains a tail buffer (SRAM), a head buffer (SRAM), and a set of parallel DRAMs (or DRAM banks). The degree of parallelism can be freely chosen. The task of a high-speed packet buffer is to maintain a set of FIFO queues that hold the packets. Similar to related architectures the tail buffer holds the queue tails, the DRAMs hold the middle parts of the queues, and the head buffer holds the queue heads. A new memory management algorithm (MMA) is proposed. Further, two MMAs from literature are used in combination with this architecture. An MMA defines how blocks are distributed to the DRAMs and how blocks are transferred between SRAM and DRAM. The typical metrics are derived to evaluate the architecture quantitatively: upper bounds for the tail buffer size and head buffer size, as well as the read latency of the system. For two MMAs these are formally proven. A detailed comparison of the metrics to those of other architectures is performed. It is shown, that the proposed architecture reduces the tail and head buffer sizes by up to 50%. The read latency is similar or equal to that of other architectures. Required DRAM resources are also compared to those of related architectures. These are also reduced significantly, as the proposed architecture is the only one that eliminates internal and external fragmentation and uses bank interleaving simultaneously. The first two properties reduce the DRAM bandwidth and capacity to the theoretical minimum, while the third minimizes the DRAM data bus pin count to provide the bandwidth. Finally, a dimensioning example for the proposed memory architecture is provided. This shows how to take advantage of the degrees of freedom, e. g., the degree of parallelism. Feasibility of the architecture is shown by a prototypical implementation of a corresponding packet buffer. The prototype was described in VHDL and an FPGA development board served as platform. The implementation is presented in detail. Functional simulations and tests on the FPGA validate the correct behavior of the prototype. Place&Route results show that the prototype supports a line rate of over 10Gbps providing 64 FIFO queues despite using an over 6 year old FPGA. The objective of the implementation was achieving full functionality. A new optimized implementation on an ASIC is estimated to support significantly more FIFO queues and a line rate of 100Gbps and far more. Concluding, the significant reduction of memory resources improves scalability towards higher line rates and higher number of queues compared to related architectures. Further, the reduction of memory resources also improves the energy efficiency as a packet buffer consists for the most part from memory.



Reference entry

Mutter, A.
A Novel Hybrid Memory Architecture for High-Speed Packet Buffers in Network Nodes - Communication Networks and Computer Engineering Report No. 108
Dissertation, Universität Stuttgart, Informatik, Elektrotechnik und Informationstechnik, 2012

BibTex file

Download  [BIBTEX]

Full Text

Download  [PDF]

Authors marked with an asterisk (*) were IKR staff members at the time the publication has been written.