Linear Prediction of Speech

Linear Prediction of Speech
Author: J.D. Markel
Publisher: Springer Science & Business Media
Total Pages: 276
Release: 2013-03-12
Genre: Science
ISBN: 3642662862

During the past ten years a new area in speech processing, generally referred to as linear prediction, has evolved. As with all scientific research, results did not always get published in a logical order and terminology was not always con sistent. In mid-1974, we decided to begin an extra hours and weekends project of organizing the literature in linear prediction of speech and developing it into a unified presentation in terms of content and terminology. This effort was completed in November, 1975, with the contents presented herein. If there are two words which describe our goals in this book, they are unifica tion and depth. Considerable effort has been spent on showing the interrelation ships among various linear prediction formulations and solutions, and in develop ing extensions such as acoustic tube models and synthesis filter structures in a unified manner with consistent terminology. Topics are presented in such a manner that derivations and theoretical details are covered, along with Fortran sub routines and practical considerations. Using this approach we hope to have made the material useful for a wide range of backgrounds and interests.

Speech Coding

Speech Coding
Author: Tom Bäckström
Publisher: Springer
Total Pages: 251
Release: 2017-03-29
Genre: Technology & Engineering
ISBN: 3319502042

This book provides scientific understanding of the most central techniques used in speech coding both for advanced students as well as professionals with a background in speech audio and or digital signal processing. It provides a clear connection between the Why’s?, How’s?, and What’s, such that the necessity, purpose and solutions provided by tools should be always within sight, as well as their strengths and weaknesses in each respect. Equivalently, this book sheds light on the following perspectives for each technology presented: Objective: What do we want to achieve and especially why is this goal important? Resource / Information: What information is available and how can it be useful? Resource / Platform: What kind of platforms are we working with and what are the capabilities/restrictions of those platforms? This includes properties such as computational, memory, acoustic and transmission capacity of devices used. Solutions: Which solutions have been proposed and how can they be used to reach the stated goals? Strengths and weaknesses: In which ways do the solutions fulfill the objectives and where are they insufficient? Are resources used efficiently? This book concentrates solely on code excited linear prediction and its derivatives since mainstream speech codecs are based on linear prediction It also concentrates exclusively on time domain techniques because frequency domain tools are to a large extent common with audio codecs.

Algorithms and Software for Predictive and Perceptual Modeling of Speech

Algorithms and Software for Predictive and Perceptual Modeling of Speech
Author: Venkatraman Atti
Publisher: Morgan & Claypool Publishers
Total Pages: 124
Release: 2010-05-05
Genre: Technology & Engineering
ISBN: 160845388X

From the early pulse code modulation-based coders to some of the recent multi-rate wideband speech coding standards, the area of speech coding made several significant strides with an objective to attain high quality of speech at the lowest possible bit rate. This book presents some of the recent advances in linear prediction (LP)-based speech analysis that employ perceptual models for narrow- and wide-band speech coding. The LP analysis-synthesis framework has been successful for speech coding because it fits well the source-system paradigm for speech synthesis. Limitations associated with the conventional LP have been studied extensively, and several extensions to LP-based analysis-synthesis have been proposed, e.g., the discrete all-pole modeling, the perceptual LP, the warped LP, the LP with modified filter structures, the IIR-based pure LP, all-pole modeling using the weighted-sum of LSP polynomials, the LP for low frequency emphasis, and the cascade-form LP. These extensions can be classified as algorithms that either attempt to improve the LP spectral envelope fitting performance or embed perceptual models in the LP. The first half of the book reviews some of the recent developments in predictive modeling of speech with the help of MatlabTM Simulation examples. Advantages of integrating perceptual models in low bit rate speech coding depend on the accuracy of these models to mimic the human performance and, more importantly, on the achievable "coding gains" and "computational overhead" associated with these physiological models. Methods that exploit the masking properties of the human ear in speech coding standards, even today, are largely based on concepts introduced by Schroeder and Atal in 1979. For example, a simple approach employed in speech coding standards is to use a perceptual weighting filter to shape the quantization noise according to the masking properties of the human ear. The second half of the book reviews some of the recent developments in perceptual modeling of speech (e.g., masking threshold, psychoacoustic models, auditory excitation pattern, and loudness) with the help of MatlabTM simulations. Supplementary material including MatlabTM programs and simulation examples presented in this book can also be accessed here. Table of Contents: Introduction / Predictive Modeling of Speech / Perceptual Modeling of Speech

Speech Dereverberation

Speech Dereverberation
Author: Patrick A. Naylor
Publisher: Springer Science & Business Media
Total Pages: 388
Release: 2010-07-27
Genre: Technology & Engineering
ISBN: 1849960569

Speech Dereverberation gathers together an overview, a mathematical formulation of the problem and the state-of-the-art solutions for dereverberation. Speech Dereverberation presents current approaches to the problem of reverberation. It provides a review of topics in room acoustics and also describes performance measures for dereverberation. The algorithms are then explained with mathematical analysis and examples that enable the reader to see the strengths and weaknesses of the various techniques, as well as giving an understanding of the questions still to be addressed. Techniques rooted in speech enhancement are included, in addition to a treatment of multichannel blind acoustic system identification and inversion. The TRINICON framework is shown in the context of dereverberation to be a generalization of the signal processing for a range of analysis and enhancement techniques. Speech Dereverberation is suitable for students at masters and doctoral level, as well as established researchers.

Linear Predictive Coding and the Internet Protocol

Linear Predictive Coding and the Internet Protocol
Author: Robert M. Gray
Publisher: Now Publishers Inc
Total Pages: 181
Release: 2010
Genre: Computers
ISBN: 1601983484

In December 1974 the first realtime conversation on the ARPAnet took place between Culler- Harrison Incorporated in Goleta, California, and MIT Lincoln Laboratory in Lexington, Massachusetts. This was the first successful application of realtime digital speech communication over a packet network and an early milestone in the explosion of realtime signal processing of speech, audio, images, and video that we all take for granted today. It could be considered as the first voice over Internet Protocol (VoIP), except that the Internet Protocol (IP) had not yet been established. In fact, the interest in realtime signal processing had an indirect, but major, impact on the development of IP. This is the story of the development of linear predictive coded (LPC) speech and how it came to be used in the first successful packet speech experiments. Several related stories are recounted as well. The history is preceded by a tutorial on linear prediction methods which incorporates a variety of views to provide context for the stories. This part is a technical survey of the fundamental ideas of linear prediction that are important for speech processing, but the development departs from traditional treatments and takes advantage of several shortcuts, simplifications, and unifications that come with years of hindsight. In particular, some of the key results are proved using short and simple techniques that are not as well known as they should be, and it also addresses some of the common assumptions made when modeling random signals. The reader interested only in the history and already familiar with or uninterested in the technical details of linear prediction and speech may skip Part I entirely.

Speech Coding Algorithms

Speech Coding Algorithms
Author: Wai C. Chu
Publisher: John Wiley & Sons
Total Pages: 584
Release: 2004-03-04
Genre: Computers
ISBN: 0471668877

Speech coding is a highly mature branch of signal processing deployed in products such as cellular phones, communication devices, and more recently, voice over internet protocol This book collects many of the techniques used in speech coding and presents them in an accessible fashion Emphasizes the foundation and evolution of standardized speech coders, covering standards from 1984 to the present The theory behind the applications is thoroughly analyzed and proved

Advances in Non-Linear Modeling for Speech Processing

Advances in Non-Linear Modeling for Speech Processing
Author: Raghunath S. Holambe
Publisher: Springer Science & Business Media
Total Pages: 109
Release: 2012-02-21
Genre: Technology & Engineering
ISBN: 1461415047

Advances in Non-Linear Modeling for Speech Processing includes advanced topics in non-linear estimation and modeling techniques along with their applications to speaker recognition. Non-linear aeroacoustic modeling approach is used to estimate the important fine-structure speech events, which are not revealed by the short time Fourier transform (STFT). This aeroacostic modeling approach provides the impetus for the high resolution Teager energy operator (TEO). This operator is characterized by a time resolution that can track rapid signal energy changes within a glottal cycle. The cepstral features like linear prediction cepstral coefficients (LPCC) and mel frequency cepstral coefficients (MFCC) are computed from the magnitude spectrum of the speech frame and the phase spectra is neglected. To overcome the problem of neglecting the phase spectra, the speech production system can be represented as an amplitude modulation-frequency modulation (AM-FM) model. To demodulate the speech signal, to estimation the amplitude envelope and instantaneous frequency components, the energy separation algorithm (ESA) and the Hilbert transform demodulation (HTD) algorithm are discussed. Different features derived using above non-linear modeling techniques are used to develop a speaker identification system. Finally, it is shown that, the fusion of speech production and speech perception mechanisms can lead to a robust feature set.