This book is an outgrowth of ten years of research at the University of Florida Computational NeuroEngineering Laboratory (CNEL) in the general area of statistical signal processing and machine learning. One of the goals of writing the book is exactly to bridge the two fields that share so many common problems and techniques but are not yet effectively collaborating. Unlike other books that cover the state of the art in a given field, this book cuts across engineering (signal processing) and statistics (machine learning)with a common theme: learning seen from the point of view ofinformation theory with an emphasis on Renyi's definition ofinformation. The basic approach is to utilize the information theory descriptors of entropy and divergence as nonparametric cost functions for the design of adaptive systems in unsupervised or supervised training modes. Hence the title: In,formation-Theoretic Learning (ITL). In the course of these studies, we discovered that the main idea enabling a synergistic view as well as algorithmic implementations,does not involve the conventional central moments of the data (mean and covariance). Rather, the core concept is the cv-norm of the PDF, in particular its expected value (a=2), which we call the information potential. This operator and related nonparametric estimators link information theory,optimization of adaptive systems, and reproducing kernel Hilbert spaces in a simple and unconventional way. Due to the pervasive nature of learning, the reading of the material requires prior basic knowledge on a broad set of subjects such as information theory, density estimation, adaptive filtering, pattern recognition, reproducing kernel Hilbert spaces (RKHS), and kernel machines. Because there are few researchers with such broad interests, the first chapter provides, in simple terms, the minimal foundations of information theory, adaptive filtering,and RKHS, while the appendix reviews density estimation. Once the reader is able to grasp these fundamentals, the book develops a nonparametric framework that is rich in understanding, setting the stage for the evolution of a new generation of algorithms of varying complexity. This book is therefore useful for professionals who are interested in improving the performance of traditional algorithms as well as researchers who are interested in exploring new approaches to machine learning. This thematic view of a broad research area is a double-sided sword. By using the same approach to treat many different problems it provides a unique and unifying perspective. On the other hand, it leaves out many competing alternatives and it complicates the evaluation of solutions. For this reason,we present many examples to illustrate and compare performance with conventional alternatives in the context of practical problems.
This book is an outgrowth of ten years of research at the University of Florida Computational NeuroEngineering Laboratory (CNEL) in the general area of statistical signal processing and machine learning. One of the goals of writing the book is exactly to bridge the two fields that share so many common problems and techniques but are not yet effectively collaborating. Unlike other books that cover the state of the art in a given field, this book cuts across engineering (signal processing) and statistics (machine learning)with a common theme: learning seen from the point of view ofinformation theory with an emphasis on Renyi's definition ofinformation. The basic approach is to utilize the information theory descriptors of entropy and divergence as nonparametric cost functions for the design of adaptive systems in unsupervised or supervised training modes. Hence the title: In,formation-Theoretic Learning (ITL). In the course of these studies, we discovered that the main idea enabling a synergistic view as well as algorithmic implementations,does not involve the conventional central moments of the data (mean and covariance). Rather, the core concept is the cv-norm of the PDF, in particular its expected value (a=2), which we call the information potential. This operator and related nonparametric estimators link information theory,optimization of adaptive systems, and reproducing kernel Hilbert spaces in a simple and unconventional way. Due to the pervasive nature of learning, the reading of the material requires prior basic knowledge on a broad set of subjects such as information theory, density estimation, adaptive filtering, pattern recognition, reproducing kernel Hilbert spaces (RKHS), and kernel machines. Because there are few researchers with such broad interests, the first chapter provides, in simple terms, the minimal foundations of information theory, adaptive filtering,and RKHS, while the appendix reviews density estimation. Once the reader is able to grasp these fundamentals, the book develops a nonparametric framework that is rich in understanding, setting the stage for the evolution of a new generation of algorithms of varying complexity. This book is therefore useful for professionals who are interested in improving the performance of traditional algorithms as well as researchers who are interested in exploring new approaches to machine learning. This thematic view of a broad research area is a double-sided sword. By using the same approach to treat many different problems it provides a unique and unifying perspective. On the other hand, it leaves out many competing alternatives and it complicates the evaluation of solutions. For this reason,we present many examples to illustrate and compare performance with conventional alternatives in the context of practical problems.