Efficient Hardware Implementation Architectures for Generalized Integrated Interleaved Decoder

Efficient Hardware Implementation Architectures for Generalized Integrated Interleaved Decoder
Author: Zhenshan Xie (Software engineer)
Publisher:
Total Pages: 0
Release: 2022
Genre: Decoders (Electronics)
ISBN:

Generalized integrated interleaved (GII) codes are advanced error-correcting codes. They nest Reed-Solomon (RS) or BCH sub-codewords to generate more powerful RS or BCH codewords. The hyper-speed decoding and good error-correction capability make GII codes one of the best candidates for next-generation terabit/s digital storage and communications. However, the hardware architecture design for GII decoder faces many challenges. Above all, the key equation solving (KES) in the nested decoding stage causes clock frequency bottleneck and takes a large portion of the GII decoder area. Besides, short GII-BCH codes are required for new fast storage class memories (SCMs), which pose new issues for the GII-BCH decoder design. Many techniques have been developed in this dissertation to eliminate the implementation bottlenecks for almost every decoding step in the decoder architecture design, especially for the nested KES. Major contributions include: i) an efficient nested KES algorithm and architecture to eliminate the clock frequency bottleneck and substantially reduce the area complexity; ii) a scaled nested KES algorithm and architecture to further reduce the area complexity by scaling polynomials to enable product term sharing; iii) a fast nested KES algorithm and architecture to break data dependency to truly reduce the critical path to one multiplier and several adders/multiplexers and hence reduce the nested KES latency almost by half; iv) a scaled fast nested KES algorithm and architecture to further reduce the area complexity while keeping only one multiplier and several adders/multiplexers in the critical path; and v) a scheme to reduce the number of processing elements without undesirable degradation on the error-correcting performance. Compared to GII-RS decoding, the nested KES design for GII-BCH decoding is more challenging, since two instead of one higher-order syndromes need to be incorporated and every other iteration needs to be skipped. Efficient nested KES designs for GII-BCH codes have also been developed by algorithmic reformulations. For the overall GII decoder, the proposed designs can achieve more than 320Gb/s throughput with only 7 gates in the critical path. Several effective schemes have also been proposed to address the issues for applying GII-BCH codes to the new fast SCM applications, where short codes with low redundancy and high correction capability are required. In this case, the error correction capabilities of the sub- and nested codewords of the GII-BCH codes are relatively small, leading to issues regarding the KES throughput/latency and decoding miscorrections. i) A high-throughput sub-word KES was developed to directly compute the polynomials and variables for 3-error-correcting decoding. Utilizing the properties of the involved variables and syndromes, reformulations were developed to enable product term sharing and hence substantially simplify the polynomial and variable computation. Almost three times throughput with smaller area can be achieved, compared to the best previous design. ii) An efficient nested KES design has been proposed to eliminate the initialization clock from each nested decoding round. The polynomial updating was split and the critical path was reduced to one multiplier and several adders/multiplexers without pre-computing combined scalars. Substantial area saving can be achieved by sharing hardware units for polynomial updating. iii) Three low-complexity methods, i.e., checking nested syndromes, utilizing extended BCH codes, and tracking error locator polynomial degrees, have been proposed to detect and mitigate the miscorrections for the decoding of short GII-BCH codes, and hence the severe performance loss can be almost completely eliminated. iv) The miscorrection mitigation schemes were further optimized and the average nested decoding latency was reduced significantly. v) A sub-word selection strategy and a higher-order syndrome updating scheme were developed to reduce the worst-case nested decoding latency substantially. For an example short GII-BCH code over $GF(2^{10})$ for SCM applications, the performance gap due to miscorrections is closed and low-complexity and low-latency decoding is achieved. In summary, the proposed designs have significant contributions to the GII decoder architecture design, especially the nested KES, and the decoding of short GII-BCH codes. In the future study, the research focus can be on the joint architecture design for other decoder components, more efficient miscorrection mitigating schemes, and concise formulas for performance estimation.

Efficient Hardware Implementation of an Advanced Turbo Decoder

Efficient Hardware Implementation of an Advanced Turbo Decoder
Author: Naresh Kumar Venkatesh
Publisher: LAP Lambert Academic Publishing
Total Pages: 96
Release: 2012-04
Genre:
ISBN: 9783847308553

Turbo decoder is a key component of the emerging 3G mobile communication. The focus of this work is towards developing an application specific integrated circuit for an advanced turbo decoder. The methodology starts from RTL models which can be used for software solution and proceeds towards hardware implementation. In the current project work, Turbo encoder and turbo decoder with SOVA and log-MAP decoding algorithms were modelled from algorithmic level, concentrating on the functional correctness rather than on implementation architecture. The effect on performance due to variation in parameters like frame length, number of iterations, type of encoding scheme and type of the interleaver in the presence of additive white Gaussian noise, using MATLAB. The hardware of the Turbo decoder has been modelled in VHDL, simulated in VCS, synthesized using Design compiler and physical implementation has been carried out using IC Compiler.

Resource Efficient LDPC Decoders

Resource Efficient LDPC Decoders
Author: Vikram Arkalgud Chandrasetty
Publisher: Academic Press
Total Pages: 192
Release: 2017-12-05
Genre: Technology & Engineering
ISBN: 0128112565

This book takes a practical hands-on approach to developing low complexity algorithms and transforming them into working hardware. It follows a complete design approach – from algorithms to hardware architectures - and addresses some of the challenges associated with their design, providing insight into implementing innovative architectures based on low complexity algorithms.The reader will learn: - Modern techniques to design, model and analyze low complexity LDPC algorithms as well as their hardware implementation - How to reduce computational complexity and power consumption using computer aided design techniques - All aspects of the design spectrum from algorithms to hardware implementation and performance trade-offs - Provides extensive treatment of LDPC decoding algorithms and hardware implementations - Gives a systematic guidance, giving a basic understanding of LDPC codes and decoding algorithms and providing practical skills in implementing efficient LDPC decoders in hardware - Companion website containing C-Programs and MATLAB models for simulating the algorithms, and Verilog HDL codes for hardware modeling and synthesis

Turbo Decoder Architecture for Beyond-4G Applications

Turbo Decoder Architecture for Beyond-4G Applications
Author: Cheng-Chi Wong
Publisher: Springer Science & Business Media
Total Pages: 106
Release: 2013-10-01
Genre: Technology & Engineering
ISBN: 1461483107

This book describes the most recent techniques for turbo decoder implementation, especially for 4G and beyond 4G applications. The authors reveal techniques for the design of high-throughput decoders for future telecommunication systems, enabling designers to reduce hardware cost and shorten processing time. Coverage includes an explanation of VLSI implementation of the turbo decoder, from basic functional units to advanced parallel architecture. The authors discuss both hardware architecture techniques and experimental results, showing the variations in area/throughput/performance with respect to several techniques. This book also illustrates turbo decoders for 3GPP-LTE/LTE-A and IEEE 802.16e/m standards, which provide a low-complexity but high-flexibility circuit structure to support these standards in multiple parallel modes. Moreover, some solutions that can overcome the limitation upon the speedup of parallel architecture by modification to turbo codec are presented here. Compared to the traditional designs, these methods can lead to at most 33% gain in throughput with similar performance and similar cost.

Efficient VLSI Architectures for Algebraic Soft-decision Decoding of Reed-Solomon Codes

Efficient VLSI Architectures for Algebraic Soft-decision Decoding of Reed-Solomon Codes
Author: Jiangli Zhu
Publisher:
Total Pages: 177
Release: 2011
Genre:
ISBN:

Algebraic soft-decision decoding (ASD) algorithms of Reed-Solomon (RS) codes have attracted much interest due to their significant coding gain and polynomial complexity. Practical ASD algorithms include the Koetter-Vardy, low-complexity Chase (LCC) and bit-level generalized minimum distance (BGMD) decodings. This thesis focuses on the design of efficient VLSI architectures for ASD decoders. One major step of ASD algorithms is the interpolation. Available interpolation algorithms can only add interpolation points or increase interpolation multiplicities. However, backward interpolation, which eliminates interpolation points or reduces multiplicities, is indispensable to enable the re-using of interpolation results. In this thesis, a novel backward interpolation is first proposed for the LCC decoding through constructing equivalent Grbner bases. In the LCC decoding, 2 test vectors need to be interpolated over. With backward interpolation, the interpolation result for each of the second and later test vectors can be computed by only one backward and one forward interpolation iterations. Compared to the previous design, the proposed backward-forward interpolation scheme can lead to significant memory saving. To reduce the interpolation latency of the LCC decoding, a unified backward-forward interpolation is proposed to carry out both interpolations in a single iteration. With only 40percent area overhead, the proposed unified interpolation architecture can almost double the throughput when large is adopted. Moreover, a reduced-complexity multi-interpolator scheme is developed for the low-latency LCC decoding. The proposed backward interpolation is further extended to the iterative BGMD decoding. By reusing the interpolation results, at least 40 percent of the interpolation iterations can be saved for a (255, 239) code while the area overhead is small. Further speedup of the BGMD interpolation is limited by the inherent serial nature of the interpolation algorithm. In this thesis, a novel interpolation scheme that can combine multiple interpolation iterations is developed. Efficient architectures are presented to integrate the combined and backward interpolation techniques. A combined-backward interpolator of a (255, 239) code is implemented and can achieve a throughput of 440 Mbps on a Xilinx XC2V4000 FPGA device. Compared to the previous fastest implementation, our implementation can achieve a speedup of 64percent with 51percent less FPGA resource. The factorization is another major step of ASD algorithms. In the re-encoded LCC decoding, it is proved that the factorization step can be eliminated. Hence, the LCC decoder can be further simplified. In the reencoded ASD decoders, a re-encoder and an erasure decoder need to be added. These two blocks can take a significant proportion of the overall decoder area and may limit the achievable throughput. An efficient re-encoder design is proposed by computing the erasure locator and evaluator through direct multiplications and reformulating other involved computations. When applied to a (255, 239) code, our re-encoder can achieve 82percent higher throughput than the previous design with 11percent less area. With minor modifications, the proposed design can also be used to implement erasure decoder. After applying available complexity-reducing techniques, complexity comparisons for three practical ASD decoders were carried out. It is derived that the LCC decoder can achieve similar or higher coding gain with lower complexity for high-rate codes. This thesis also provides discussions on how the hardware complexities of ASD decoders change with codeword length, code rate and other parameters.

Turbo-like Codes

Turbo-like Codes
Author: Aliazam Abbasfar
Publisher: Springer
Total Pages: 84
Release: 2007-08-28
Genre: Technology & Engineering
ISBN: 9781402063909

This book introduces turbo error correcting concept in a simple language, including a general theory and the algorithms for decoding turbo-like code. It presents a unified framework for the design and analysis of turbo codes and LDPC codes and their decoding algorithms. A major focus is on high speed turbo decoding, which targets applications with data rates of several hundred million bits per second (Mbps).

Efficient Decoder Design for Error Correction Codes

Efficient Decoder Design for Error Correction Codes
Author: Jinjin He
Publisher:
Total Pages: 238
Release: 2010
Genre: Error-correcting codes (Information theory)
ISBN:

Error correction codes (ECCs) have been widely used in communication systems and storage devices. Nowadays, the rapid development of integrated circuit technologies makes feasible the implementation of powerful ECCs such as turbo code and low-density parity-check (LDPC) code. However, these high-performance codes require complex decoding algorithms, resulting in large hardware area and high power consumption. Furthermore, some of these decoders require an iterative decoding process, which leads to a long decoding latency. Therefore, low-complexity, low-power and high-speed very-large-scale integration (VLSI) architecture design for the ECC decoder is of great importance. This dissertation focuses on efficient VLSI implementation for the decoders of convolutional codes and two advanced coding schemes based on convolutional code: trellis-coded modulation (TCM) and convolutional turbo code (CTC). The first part of this dissertation is dedicated to low-complexity, low-power decoders design for a 4-dimensional, 8-ary phase-shift keying (4-D 8PSK) TCM system. We propose a low-complexity architecture for the transition-metric unit (TMU) to reduce the hardware area without performance loss. Then, a power-efficient scheme by applying T-algorithm on branch metrics (BMs) is proposed for the Viterbi decoder (VD) embedded in the 4-D 8PSK TCM decoder. Unlike the conventional T-algorithm, the proposed scheme does not affect the clock speed of the decoder. Finally, a hybrid T-algorithm is developed by applying T-algorithm on both BMs and path metrics (PMs), which reduces significantly more computations than the conventional T-algorithm applied on PMs. The VLSI design for VDs has been an active research area for decades. In the second part of the dissertation, we extend our research to a more general topic of VDs, where novel architectures are explored to efficiently reduce the power consumption, while still maintaining a high decoding speed and a low decoding latency. CTCs are constructed from parallel convolutional encoding of the same message in different sequences and have the error-correcting capability near the Shannon bound. Practical decoding schemes normally require an iterative decoding process employing the soft-in soft-out (SISO) decoder. The third part of this dissertation is focused on the SISO decoder design for double-binary (DB) CTCs. We propose a low-complexity, memory-reduced architecture by partitioning BMs into two independent portions: information metrics and parity metrics. Furthermore, high-speed recursion architectures for logarithm domain maximum a posteriori probability (log-MAP) algorithm are proposed to increase the decoding speed by algorithmic approximation and bit-level optimization.

VLSI Architectures for Modern Error-Correcting Codes

VLSI Architectures for Modern Error-Correcting Codes
Author: Xinmiao Zhang
Publisher: CRC Press
Total Pages: 410
Release: 2017-12-19
Genre: Technology & Engineering
ISBN: 148222965X

Error-correcting codes are ubiquitous. They are adopted in almost every modern digital communication and storage system, such as wireless communications, optical communications, Flash memories, computer hard drives, sensor networks, and deep-space probing. New-generation and emerging applications demand codes with better error-correcting capability. On the other hand, the design and implementation of those high-gain error-correcting codes pose many challenges. They usually involve complex mathematical computations, and mapping them directly to hardware often leads to very high complexity. VLSI Architectures for Modern Error-Correcting Codes serves as a bridge connecting advancements in coding theory to practical hardware implementations. Instead of focusing on circuit-level design techniques, the book highlights integrated algorithmic and architectural transformations that lead to great improvements on throughput, silicon area requirement, and/or power consumption in the hardware implementation. The goal of this book is to provide a comprehensive and systematic review of available techniques and architectures, so that they can be easily followed by system and hardware designers to develop en/decoder implementations that meet error-correcting performance and cost requirements. This book can be also used as a reference for graduate-level courses on VLSI design and error-correcting coding. Particular emphases are placed on hard- and soft-decision Reed-Solomon (RS) and Bose-Chaudhuri-Hocquenghem (BCH) codes, and binary and non-binary low-density parity-check (LDPC) codes. These codes are among the best candidates for modern and emerging applications due to their good error-correcting performance and lower implementation complexity compared to other codes. To help explain the computations and en/decoder architectures, many examples and case studies are included. More importantly, discussions are provided on the advantages and drawbacks of different implementation approaches and architectures.

VLSI

VLSI
Author: Zhongfeng Wang
Publisher: BoD – Books on Demand
Total Pages: 467
Release: 2010-02-01
Genre: Technology & Engineering
ISBN: 9533070498

The process of Integrated Circuits (IC) started its era of VLSI (Very Large Scale Integration) in 1970’s when thousands of transistors were integrated into one single chip. Nowadays we are able to integrate more than a billion transistors on a single chip. However, the term “VLSI” is still being used, though there was some effort to coin a new term ULSI (Ultra-Large Scale Integration) for fine distinctions many years ago. VLSI technology has brought tremendous benefits to our everyday life since its occurrence. VLSI circuits are used everywhere, real applications include microprocessors in a personal computer or workstation, chips in a graphic card, digital camera or camcorder, chips in a cell phone or a portable computing device, and embedded processors in an automobile, et al. VLSI covers many phases of design and fabrication of integrated circuits. For a commercial chip design, it involves system definition, VLSI architecture design and optimization, RTL (register transfer language) coding, (pre- and post-synthesis) simulation and verification, synthesis, place and route, timing analyses and timing closure, and multi-step semiconductor device fabrication including wafer processing, die preparation, IC packaging and testing, et al. As the process technology scales down, hundreds or even thousands of millions of transistors are integrated into one single chip. Hence, more and more complicated systems can be integrated into a single chip, the so-called System-on-chip (SoC), which brings to VLSI engineers ever increasingly challenges to master techniques in various phases of VLSI design. For modern SoC design, practical applications are usually speed hungry. For instance, Ethernet standard has evolved from 10Mbps to 10Gbps. Now the specification for 100Mbps Ethernet is on the way. On the other hand, with the popularity of wireless and portable computing devices, low power consumption has become extremely critical. To meet these contradicting requirements, VLSI designers have to perform optimizations at all levels of design. This book is intended to cover a wide range of VLSI design topics. The book can be roughly partitioned into four parts. Part I is mainly focused on algorithmic level and architectural level VLSI design and optimization for image and video signal processing systems. Part II addresses VLSI design optimizations for cryptography and error correction coding. Part III discusses general SoC design techniques as well as other application-specific VLSI design optimizations. The last part will cover generic nano-scale circuit-level design techniques.

High-level Synthesis

High-level Synthesis
Author: Michael Fingeroff
Publisher: Xlibris Corporation
Total Pages: 334
Release: 2010
Genre: Computers
ISBN: 1450097243

Are you an RTL or system designer that is currently using, moving, or planning to move to an HLS design environment? Finally, a comprehensive guide for designing hardware using C++ is here. Michael Fingeroff's High-Level Synthesis Blue Book presents the most effective C++ synthesis coding style for achieving high quality RTL. Master a totally new design methodology for coding increasingly complex designs! This book provides a step-by-step approach to using C++ as a hardware design language, including an introduction to the basics of HLS using concepts familiar to RTL designers. Each chapter provides easy-to-understand C++ examples, along with hardware and timing diagrams where appropriate. The book progresses from simple concepts such as sequential logic design to more complicated topics such as memory architecture and hierarchical sub-system design. Later chapters bring together many of the earlier HLS design concepts through their application in simplified design examples. These examples illustrate the fundamental principles behind C++ hardware design, which will translate to much larger designs. Although this book focuses primarily on C and C++ to present the basics of C++ synthesis, all of the concepts are equally applicable to SystemC when describing the core algorithmic part of a design. On completion of this book, readers should be well on their way to becoming experts in high-level synthesis.