Algorithms and Architectures for Efficient Low Density Parity Check (LDPC) Decoder Hardware

Algorithms and Architectures for Efficient Low Density Parity Check (LDPC) Decoder Hardware
Author: Tinoosh Mohsenin
Publisher:
Total Pages:
Release: 2010
Genre:
ISBN: 9781124509181

Many emerging and future communication applications require a significant amount of high throughput data processing and operate with decreasing power budgets. This need for greater energy efficiency and improved performance of electronic devices demands a joint optimization of algorithms, architectures, and implementations. Low Density Parity Check (LDPC) decoding has received significant attention due to its superior error correction performance, and has been adopted by recent communication standards such as 10GBASE-T 10 Gigabit Ethernet. Currently high performance LDPC decoders are designed to be dedicated blocks within a System-on-Chip (SoC) and require many processing nodes. These nodes require a large set of interconnect circuitry whose delay and power are wire-dominated circuits. Therefore, low clock rates and increased area are a common result of the codes' inherent irregular and global communication patterns. As the delay and energy costs caused by wires are likely to increase in future fabrication technologies new solutions dealing with future VLSI challenges must be considered. Three novel message-passing decoding algorithms, Split-Row, Multi-Splitand Split-Row Threshold are introduced, which significantly reduce processor logical complexity and local and global interconnections. One conventional and four Split-Row Threshold LDPC decoders compatible with the 10GBASE-T standard are implemented in 65 nm CMOS and presented along with their trade-offs in error correction performance, wire interconnect complexity, decoder area, power dissipation, and speed. For additional power saving, an adaptive wordwidth decoding algorithm is proposed which switches between a 6-bit Normal Mode and a reduced 3-bit Low Power Mode depending on the SNR and decoding iteration. A 16-way Split-Row Threshold with adaptive wordwidth implementation achieves improvements in area, throughput and energy efficiency of 3.9x, 2.6x, and 3.6x respectively, compared to a MinSum Normalized implementation, with an SNR loss of 0.25 dB at BER = 10−7. The decoder occupies a die area of 5.10 mm2, operates up to 185 MHz at 1.3 V, and attains an average throughput of 85.7 Gbps with early-termination. Low power operation at 0.6 V gives a worst case throughput of 9.3 Gbps--above the 6.4 Gbps 10GBASE-T requirement, and an average power of 31 mW.

Resource Efficient LDPC Decoders

Resource Efficient LDPC Decoders
Author: Vikram Arkalgud Chandrasetty
Publisher: Academic Press
Total Pages: 192
Release: 2017-12-05
Genre: Technology & Engineering
ISBN: 0128112565

This book takes a practical hands-on approach to developing low complexity algorithms and transforming them into working hardware. It follows a complete design approach – from algorithms to hardware architectures - and addresses some of the challenges associated with their design, providing insight into implementing innovative architectures based on low complexity algorithms.The reader will learn: Modern techniques to design, model and analyze low complexity LDPC algorithms as well as their hardware implementation How to reduce computational complexity and power consumption using computer aided design techniques All aspects of the design spectrum from algorithms to hardware implementation and performance trade-offs Provides extensive treatment of LDPC decoding algorithms and hardware implementations Gives a systematic guidance, giving a basic understanding of LDPC codes and decoding algorithms and providing practical skills in implementing efficient LDPC decoders in hardware Companion website containing C-Programs and MATLAB models for simulating the algorithms, and Verilog HDL codes for hardware modeling and synthesis

VLSI Architectures for Multi-Gbps Low-Density Parity-Check Decoders

VLSI Architectures for Multi-Gbps Low-Density Parity-Check Decoders
Author: Ahmad Darabiha
Publisher:
Total Pages: 228
Release: 2008
Genre:
ISBN: 9780494398173

Near-capacity performance and parallelizable decoding algorithms have made Low-Density Parity Check (LDPC) codes a powerful competitor to previous generations of codes, such as Turbo and Reed Solomon codes, for reliable high-speed digital communications. As a result, they have been adopted in several emerging standards. This thesis investigates VLSI architectures for multi-Gbps power and area-efficient LDPC decoders. To reduce the node-to-node communication complexity, a decoding scheme is proposed in which messages are transferred and computed bit-serially. Also, a broadcasting scheme is proposed in which the traditional computations required in the sum-product and min-sum decoding algorithms are repartitioned between the check and variable node units. To increase decoding throughput, a block interlacing scheme is investigated which is particularly advantageous in fully-parallel LDPC decoders. To increase decoder energy efficiency, an efficient early termination scheme is proposed. In addition, an analysis is given of how increased hardware parallelism coupled with a reduced supply voltage is a particularly effective approach to reduce the power consumption of LDPC decoders. These architectures and circuits are demonstrated in two hardware implementations. Specifically, a 610-Mbps bit-serial fully-parallel (480, 355) LDPC decoder on a single Altera Stratix EP1S80 device is presented. To our knowledge, this is the fastest FPGA-based LDPC decoder reported in the literature. A fabricated 0.13-mum CMOS bit-serial (660, 484) LDPC decoder is also presented. The decoder has a 300 MHz maximum clock frequency and a 3.3 Gbps throughput with a nominal 1.2-V supply and performs within 3 dB of the Shannon limit at a BER of 10-5. With more than 60% power saving gained by early termination, the decoder consumes 10.4 pJ/bit/iteration at Eb=N0=4dB. Coupling early termination with supply voltage scaling results in an even lower energy consumption of 2.7 pJ/bit/iteration with 648 Mbps decoding throughput. The proposed techniques demonstrate that the bit-serial fully-parallel architecture is preferred to memory-based partially-parallel architectures, both in terms of throughput and energy efficiency, for applications such as 10GBase-T which use medium-size LDPC code (e.g., 2048 bit) and require multi-Gbps decoding throughput.

Low-complexity High-speed VLSI Design of Low-density Parity-check Decoders

Low-complexity High-speed VLSI Design of Low-density Parity-check Decoders
Author: Zhiqiang Cui
Publisher:
Total Pages: 218
Release: 2008
Genre: Decoders (Electronics)
ISBN:

Low-Density Parity-check (LDPC) codes have attracted considerable attention due to their capacity approaching performance over AWGN channel and highly parallelizable decoding schemes. They have been considered in a variety of industry standards for the next generation communication systems. In general, LDPC codes achieve outstanding performance with large codeword lengths (e.g., N>1000 bits), which lead to a linear increase of the size of memory for storing all the soft messages in LDPC decoding. In the next generation communication systems, the target data rates range from a few hundred Mbit/sec to several Gbit/sec. To achieve those very high decoding throughput, a large amount of computation units are required, which will significantly increase the hardware cost and power consumption of LDPC decoders. LDPC codes are decoded using iterative decoding algorithms. The decoding latency and power consumption are linearly proportional to the number of decoding iterations. A decoding approach with fast convergence speed is highly desired in practice. This thesis considers various VLSI design issues of LDPC decoder and develops efficient approaches for reducing memory requirement, low complexity implementation, and high speed decoding of LDPC codes. We propose a memory efficient partially parallel decoder architecture suited for quasi-cyclic LDPC (QC-LDPC) codes using Min-Sum decoding algorithm. We develop an efficient architecture for general permutation matrix based LDPC codes. We have explored various approaches to linearly increase the decoding throughput with a small amount of hardware overhead. We develop a multi-Gbit/sec LDPC decoder architecture for QC-LDPC codes and prototype an enhanced partially parallel decoder architecture for a Euclidian geometry based LDPC code on FPGA. We propose an early stopping scheme and an extended layered decoding method to reduce the number of decoding iterations for undecodable and decodable sequence received from channel. We also propose a low-complexity optimized 2-bit decoding approach which requires comparable implementation complexity to weighted bit flipping based algorithms but has much better decoding performance and faster convergence speed.

Area and Energy Efficient VLSI Architectures for Low-density Parity-check Decoders Using an On-the-fly Computation

Area and Energy Efficient VLSI Architectures for Low-density Parity-check Decoders Using an On-the-fly Computation
Author: Kiran Kumar Gunnam
Publisher:
Total Pages:
Release: 2010
Genre:
ISBN:

The VLSI implementation complexity of a low density parity check (LDPC) decoder is largely influenced by the interconnect and the storage requirements. This dissertation presents the decoder architectures for regular and irregular LDPC codes that provide substantial gains over existing academic and commercial implementations. Several structured properties of LDPC codes and decoding algorithms are observed and are used to construct hardware implementation with reduced processing complexity. The proposed architectures utilize an on-the-fly computation paradigm which permits scheduling of the computations in a way that the memory requirements and re-computations are reduced. Using this paradigm, the run-time configurable and multi-rate VLSI architectures for the rate compatible array LDPC codes and irregular block LDPC codes are designed. Rate compatible array codes are considered for DSL applications. Irregular block LDPC codes are proposed for IEEE 802.16e, IEEE 802.11n, and IEEE 802.20. When compared with a recent implementation of an 802.11n LDPC decoder, the proposed decoder reduces the logic complexity by 6.45x and memory complexity by 2x for a given data throughput. When compared to the latest reported multi-rate decoders, this decoder design has an area efficiency of around 5.5x and energy efficiency of 2.6x for a given data throughput. The numbers are normalized for a 180nm CMOS process. Properly designed array codes have low error floors and meet the requirements of magnetic channel and other applications which need several Gbps of data throughput. A high throughput and fixed code architecture for array LDPC codes has been designed. No modification to the code is performed as this can result in high error floors. This parallel decoder architecture has no routing congestion and is scalable for longer block lengths. When compared to the latest fixed code parallel decoders in the literature, this design has an area efficiency of around 36x and an energy efficiency of 3x for a given data throughput. Again, the numbers are normalized for a 180nm CMOS process. In summary, the design and analysis details of the proposed architectures are described in this dissertation. The results from the extensive simulation and VHDL verification on FPGA and ASIC design platforms are also presented.

Low-complexity Decoding Algorithms and Architectures for Non-binary LDPC Codes

Low-complexity Decoding Algorithms and Architectures for Non-binary LDPC Codes
Author: Fang Cai
Publisher:
Total Pages: 149
Release: 2013
Genre:
ISBN:

Non-binary low-density parity-check (NB-LDPC) codes can achieve better error-correcting performance than their binary counterparts when the code length is moderate at the cost of higher decoding complexity. The high complexity is mainly caused by the complicated computations in the check node processing and the large memory requirement. In this thesis, three decoding algorithms and corresponding VLSI architectures are proposed for NB-LDPC codes to lower the computational complexity and memory requirement. The first design is based on the proposed relaxed Min-max decoding algorithm. A novel relaxed check node processing scheme is proposed for the Min-max NB-LDPC decoding algorithm. Each finite field element of GF(2p̂) can be uniquely represented by a linear combination of $p$ independent field elements. Making use of this property, an innovative method is developed in this paper to first find a set of the p most reliable variable-to-check messages with independent field elements, called the minimum basis. Then the check-to-variable messages are efficiently computed from the minimum basis. With very small performance loss, the complexity of the check node processing can be substantially reduced using the proposed scheme. In addition, efficient VLSI architectures are developed to implement the proposed check node processing and overall NB-LDPC decoder. Compared to the most efficient prior design, the proposed decoder for a (837, 726) NB-LDPC code over GF(25̂) can achieve 52% higher efficiency in terms of throughput-over-area ratio. The second design is based on a proposed enhanced iterative hard reliability-based majority-logic decoding. The recently developed iterative hard reliability-based majority-logic NB-LDPC decoding has better performance-complexity tradeoffs than previous algorithms. Novel schemes are proposed for the iterative hard reliability-based majority-logic decoding (IHRB-MLGD). Compared to the IHRB algorithm, our enhanced (E- )IHRB algorithm can achieve significant coding gain with small hardware overhead. Then low-complexity partial-parallel NB-LDPC decoder architectures are developed based on these two algorithms. Many existing NB-LDPC code construction methods lead to quasi-cyclic or cyclic codes. Both types of codes are considered in our design. Moreover, novel schemes are developed to keep a small proportion of messages in order to reduce the memory requirement without causing noticeable performance loss. In addition, a shift-message structure is proposed by using memories concatenated with variable node units to enable efficient partial-parallel decoding for cyclic NB-LDPC codes. Compared to previous designs based on the Min-max decoding algorithm, our proposed decoders have at least tens of times lower complexity with moderate coding gain loss. The third design is based on a proposed check node decoding scheme using power representation of finite field elements. Novel schemes are proposed for the Min-max check node processing by making use of the cyclical-shift property of the power representation of finite field elements. Compared to previous designs based on the Min-max algorithm with forward-backward scheme, the proposed check node units (CNUs) do not need the complex switching network. Moreover, the multiplications of the parity check matrix entries are efficiently incorporated. For a Min-max NB-LDPC decoder over GF(32), the proposed scheme reduces the CNU area by at least 32%, and leads to higher clock frequency.

Decoder Architectures and Implementations for Quasi-cyclic Low-density Parity-check Codes

Decoder Architectures and Implementations for Quasi-cyclic Low-density Parity-check Codes
Author: Xiaoheng Chen
Publisher:
Total Pages:
Release: 2011
Genre:
ISBN: 9781124906669

Since the rediscovery of low-density parity-check (LDPC) codes in the late 1990s, tremendous progress has been made in code construction and design, decoding algorithms, and decoder implementation of these capacity-approaching codes. Recently, LDPC codes are considered for applications such as high-speed satellite and optical communications, the hard disk drives, and high-density flash memory based storage systems, which require that the codes are free of error-floor down to bit error rate (BER) as low as 10−12 to 10−15. FPGAs are usually used to evaluate the error performance of codes, since one can exploit the finite word length and extremely high internal memory bandwidth of an FPGA. Existing FPGA-based LDPC decoders fail to utilize the configurability and read-first mode of embedded memory in the FPGAs, and thus result in limited throughput and codes sizes. Four optimization techniques, i.e., vectorization, folding, message relocation, and circulant permutation matrix (CPM) sharing, are proposed to improve the throughput, scalability, and efficiency of FPGA-based decoders. Also, a semi-automatic CAD tool called QCSYN (Quasi-Cyclic LDPC decoder SYNthesis) is designed to shorten the implementation time of decoders. Using the above techniques, a high-rate (16129,15372) code is shown to have no error-floor down to the BER of 10−14. Also, it is very difficult to construct codes that do not exhibit an error floor down to 10−15 or so. Without detailed knowledge of dominant trapping sets, a backtracking-based reconfigurable decoder is designed to lower the error floor of a family of structurally compatible quasi-cyclic LDPC codes by one to two orders of magnitudes. Hardware reconfigurability is another significant feature of LDPC decoders. A tri-mode decoder for the (4095,3367) Euclidean geometry code is designed to work with three compatible binary message passing decoding algorithms. Note that this code contains 262080 edges (21.3 times of the (2048,1723) 10GBASE-T code) in its Tanner graph and is the largest code ever implemented. Besides, an efficient QC-LDPC Shift Network (QSN) is proposed to reduce the interconnect delay and control logic of circular shift network, a core component in the reconfigurable decoder that supports a family of structurally compatible codes. The interconnect delay and control logic area are reduced by a factor of 2.12 and 8, respectively. Non-binary LDPC codes are effective in combating burst errors. Using the power representation of the elements in the Galois field to organize both intrinsic and extrinsic messages, we present an efficient decoder architecture for non-binary QC-LDPC codes. The proposed decoder is reconfigurable and can be used to decode any code of a given field size. The decoder supports both regular and irregular non-binary QC-LDPC codes. Using a practical metric of throughput per unit area, the proposed implementation outperforms the best implementations published in research literature to date.

High-Performance Decoder Architectures For Low-Density Parity-Check Codes

High-Performance Decoder Architectures For Low-Density Parity-Check Codes
Author: Kai Zhang
Publisher:
Total Pages: 244
Release: 2012
Genre:
ISBN:

Abstract: The Low-Density Parity-Check (LDPC) codes, which were invented by Gallager back in 1960s, have attracted considerable attentions recently. Compared with other error correction codes, LDPC codes are well suited for wireless, optical, and magnetic recording systems due to their near- Shannon-limit error-correcting capacity, high intrinsic parallelism and high-throughput potentials. With these remarkable characteristics, LDPC codes have been adopted in several recent communication standards such as 802.11n (Wi-Fi), 802.16e (WiMax), 802.15.3c (WPAN), DVB-S2 and CMMB. This dissertation is devoted to exploring efficient VLSI architectures for high-performance LDPC decoders and LDPC-like detectors in sparse inter-symbol interference (ISI) channels. The performance of an LDPC decoder is mainly evaluated by area efficiency, error-correcting capability, throughput and rate flexibility. With this work we investigate tradeoffs between the four performance aspects and develop several decoder architectures to improve one or several performance aspects while maintaining acceptable values for other aspects ... Layered decoding algorithm, which is popular in LDPC decoding, is also adopted in this paper. Simulation results show that the layered decoding doubles the convergence speed of the iterative belief propagation process. Exploring the special structure of the connections between the check nodes and the variable nodes on the factor graph, we propose an effective detector architecture for generic sparse ISI channels to facilitate the practical application of the proposed detection algorithm. The proposed architecture is also reconfigurable in order to switch flexible connections on the factor graph in the time-varying ISI channels.

High-Performance and Energy-Efficient Decoder Design for Non-Binary LDPC Codes

High-Performance and Energy-Efficient Decoder Design for Non-Binary LDPC Codes
Author: Yuta Toriyama
Publisher:
Total Pages: 133
Release: 2016
Genre:
ISBN:

Binary Low-Density Parity-Check (LDPC) codes are a type of error correction code known to exhibit excellent error-correcting capabilities, and have increasingly been applied as the forward error correction solution in a multitude of systems and standards, such as wireless communications, wireline communications, and data storage systems. In the pursuit of codes with even higher coding gain, non-binary LDPC (NB-LDPC) codes defined over a Galois field of order q have risen as a strong replacement candidate. For codes defined with similar rate and length, NB-LDPC codes exhibit a significant coding gain improvement relative to that of their binary counterparts. Unfortunately, NB-LDPC codes are currently limited from practical application by the immense complexity of their decoding algorithms, because the improved error-rate performance of higher field orders comes at the cost of increasing decoding algorithm complexity. Currently available ASIC implementation solutions for NB-LDPC code decoders are simultaneously low in throughput and power-hungry, leading to a low energy efficiency. We propose several techniques at the algorithm level as well as hardware architecture level in an attempt to bring NB-LDPC codes closer to practical deployment. On the algorithm side, we propose several algorithmic modifications and analyze the corresponding hardware cost alleviation as well as impact on coding gain. We also study the quantization scheme for NB-LDPC decoders, again in the context of both the hardware and coding gain impacts, and we propose a technique that enables a good tradeoff in this space. On the hardware side, we develop a FPGA-based NB-LDPC decoder platform for architecture prototyping as well as hardware acceleration of code evaluation via error rate simulations. We also discuss the architectural techniques and innovations corresponding to our proposed algorithm for optimization of the implementation. Finally, a proof-of-concept ASIC chip is realized that integrates many of the proposed techniques. We are able to achieve a 3.7x improvement in the information throughput and 23.8x improvement in the energy efficiency over prior state-of-the-art, without sacrificing the strong error correcting capabilities of the NB-LDPC code.

Low-Density Parity-Check Decoder Architectures for Integrated Circuits and Quantum Cryptography

Low-Density Parity-Check Decoder Architectures for Integrated Circuits and Quantum Cryptography
Author: Mario Milicevic
Publisher:
Total Pages:
Release: 2017
Genre:
ISBN:

Forward error correction enables reliable one-way communication over noisy channels, by transmitting redundant data along with the message in order to detect and resolve errors at the receiver. Low-density parity-check (LDPC) codes achieve superior error-correction performance on Gaussian channels under belief propagation decoding, however, their complex parity-check matrix structure introduces hardware implementation challenges. This thesis explores how the quasi-cyclic structure of LDPC parity-check matrices can be exploited in the design of low-power hardware architectures for multi-Gigabit/second decoders realized in CMOS technology, as well as in the design and construction of multi-edge LDPC codes for long-distance (beyond 100km) quantum cryptography over optical fiber. A frame-interleaved architecture is presented with a path-unrolled message-passing schedule to reduce the complexity of routing interconnect in an integrated circuit decoder implementation. A proof-of-concept silicon test chip was fabricated in the 28nm CMOS technology node. The LDPC decoder chip supports the four codes presented in the IEEE 802.11ad standard, occupies an area of 3.41mm^2, and achieves an energy efficiency of 15pJ/bit while delivering a maximum throughput of 6.78Gb/s, and operating with a 202MHz clock at 0.9V supply. The test chip achieves the highest normalized energy efficiency among published CMOS-based decoders for the IEEE 802.11ad standard. A quasi-cyclic code construction technique is applied to a multi-edge LDPC code with block length of 10^6 bits in order to reduce the latency of LDPC decoding in the key reconciliation step of long-distance quantum key distribution. The GPU-based decoder achieves a maximum information throughput of 7.16Kb/s, and extends the current maximum transmission distance from 100km to 160km with a secret key rate of 4.10 x 10^(-7) bits/pulse under 8-dimensional reconciliation. The GPU-based decoder delivers up to 8.03x higher decoded information throughput over the upper bound on secret key rate for a lossy optical channel, thus demonstrating that key reconciliation with LDPC codes is no longer a post-processing bottleneck in quantum key distribution. The contributions presented in this thesis can be applied to future research in the implementation of silicon-based linear-program decoders for high-reliability channels, and single-chip solutions for quantum key distribution containing integrated photonics and post-processing algorithms.