[up to nervous]

Liquid State Machines: An Annotated Bibliography

John J. Barton.

Papers Discussing Liquid State Machines Directly

Liquid State Machines
Furthermore, this alternative computation style is supported by theoretical results (see section 4), which suggest that it is in principle as powerful as von Neumann style computational models such as Turing machines, but more adequate for the type of real-time computing on analog input streams that is carried out by the nervous system.[On the Computational Power of Circuits of Spiking Neurons Wolfgang Maass Henry Markram]
LSM in a bucket of water (!)
Fernando, C. & Sojakka, S. (2003), Pattern recognition in a bucket, in ‘ Proceedings of the Seventh European Conference on Artificial Life (ECAL 2003)’
LSM for robotics
[Biologically inspired neural networks for the control of embodied agents Razvan V. Florian] Clearly explained review of spiking networks and LSM.
LSM for artificial mouse, Review
The most important aspect of the liquid is to react differently enough on different input sequences; the amount of distance created between those is called the separation property (SP) of the liquid. The SP (see fig. 3) reflects the ability of the liquid to create different trajectories of internal states for each of the input classes. The ability of the readout units to distinguish these trajectories, generalize and relate them to the desired outputs is called the approximation property (AP). This property depends on the adaptability of the chosen readout units, whereas the SP is based directly on the liquid’s complexity. [Liquid State Machines, a review Jilles Vreeken]
LSM with columnar structure.
In the LSM model there is a trade-off between the complexity of the liquid and the complexity of the readout. The optimal point for this tradeoff depends on factors such as the kinds and number of target filters that have to be simultaneously implemented. [P. Joshi. Synthesis of a Liquid State Machine with Hopfield/Brody Transient Synchrony. Master's Thesis, Center for Advanced Computer Studies, University of Louisiana, Lafayette, U.S.A., Nov. 2002.]
A Universal Approximation Theorem for Dynamic Networks
A time-invariant filter with fading memory can be approximated with 1) a dynamic network or 2) a Volterra series. A dynamic network has time-dependent weights; these can be faciliation or depression. [Processing of Time Series by Neural Circuits with Biologically Realistic Synaptic Dynamics, Thomas Natschlaager & Wolfgang Maass, Eduardo D. Sontag Anthony Zador]
LSM Slide deck by Maass
LSM and Radial Basis Functions
Online clustering with spiking neurons using temporal coding. [T. Natschläger and B. Ruf. In L. S. Smith and A. Hamilton, editors, Neuromorphic Systems: Engineering Silicon from Neurobiology], local update rules, radial basis functions
We propose a mechanism for unsupervised learning in networks of spiking neurons which is based on the timing of single firing events. Our results show that a topology preserving behaviour quite similar to that of Kohonen's self-organizing map can be achieved using temporal coding. In contrast to previous approaches, which use rate coding, the winner among competing neurons can be determined fast and locally. Hence our model is a further step towards a more realistic description of unsupervised. [Self-Organization of Spiking Neurons Using Action Potential Timing Berthold Ruf, Michael Schmitt]
Echo State Machines
The "echo state" approach looks at RNNs from a new angle. Large RNNs are interpreted as "reservoirs" of complex, excitable dynamics. Output units "tap" from this reservoir. This idea leads to training algorithms where only the network-to-output connection weights have to be trained. This can be done with known, highly efficient linear regression algorithms. See also Adaptive Nonlinear System Identification with Echo State Networks
Herbert Jaeger


Kernel Methods; Support Vector Machines

The solutions sought by kernel-based algorithms such as the support vector machine (SVM) are affine functions in the feature space:    

for some weight vector w from the feature space. The kernel can be exploited whenever the weight vector can be expressed as a linear combination of the training points,


implying that we can express f as

   f(x) = i=1 to n sum alpha k(xi; x) + b:[Learning the Kernel Matrix with Semidefinite Programming, Lanckriet, Cristianini Peter Bartlett Laurent El Ghaoui Michael I. Jordan]

Knowledge-based analysis of microarray gene expression data by using support vector machines [Michael P. S. Brown, William Noble Grundy, David Lin, Nello Cristianini, Charles Walsh Sugnet, Terrence S. Furey, Manuel Ares, Jr., and David Haussler]
Library of Kernel Matrices
It is natural to envision libraries of kernel matrices in elds such as bioinformatics, computational vision, and information retrieval, in which multiple data sources abound. Such libraries would summarize the statistically-relevant features of primary data, and encapsulate domain speci c knowledge. Tools such as the semide nite programming methods that we have presented here can be used to bring these multiple data sources together in novel ways to make predictions and decisions.[Learning the kernel matrix with semidefinite programming. G. R. G. Lanckriet, N. Cristianini, L. El Ghaoui, P. L. Bartlett, and M. I. Jordan. In press: Journal of Machine Learning Research, 2003.]
Spikes and Support Vector Machines
Spikernels: Embedding Spiking Neurons in Inner-Product Spaces Lavi Shpigelman Should connect to Kevin Judd's paper. No ref to Maass.
Does LSM implement adaptive volterra filter?
The message of Maass's work is that the recursive neural network can compute a filter equivalent to a Volterra Filter. If the linear variables in the LSM are set adaptively then we would have an adaptive Volterra filter, the latter defined in e.g [ADAPTIVE VOLTERRA FILTERS FOR NONLINEAR ACOUSTIC ECHO CANCELLATION A. Stenger and R. Rabenstein]
Nonlinear Analysis of Time Series
Several methods exist for adjusting nonlinear parameters. A common technique begins with k basis functions with arbitrarily chosen parameters, then adjusts the parameters by gradient descent to find an optimal model. Another technique makes a grid search over a region of parameter space. However, we observe, in the light of Section 1.2, that parameters need only be specified to some precision. If one is lucky, or careful, the precision required of nonlinear parameters is much less then that required for – parameters, and hence accurate adjustment of nonlinear parameters may not be critical to a model's performance. This does seem to be supported by experience in using radial basis functions, where the literature is full of good models built from even randomly chosen centers. Consequently, we propose the following method to optimize the nonlinear parameters: initially choose a large number of basis functions with various arbitrary values for the nonlinear parameters, and then select the k basis functions that give the best model. Of course, this is what we have already called the restricted­selection problem; now, however, we are using the fact that if enough basis functions were initially chosen, at least some of then would lie near to the optimal values for the nonlinear parameters and be indistinguishable at the precision required of the optimal values.[K. Judd and A. Mees. On selecting models for nonlinear time series. Physica D, 82:426--444, 1995. http://citeseer.nj.nec.com/judd95selecting.html] Perhaps this is the mechanism of LSM?
[see also: Radial-Basis Models for Feedback Systems With Fading Memory (2001) David M. Walker, Nicholas B. Tufillaro, Paul Gross]
May connect SVM and LMS. Also connects SVM to Bayes
In this paper we extend the conformal method of modifying a kernel function to improve the performance of Support Vector Machine classifiers [14, 15]. The kernel function is conformally transformed in a data-dependent way by using the information of Support Vectors obtained in primary training. We further investigate the performances of modified Gaussian Radial Basis Function and Polynomial kernels. Simulation results for two artificial data sets show that the method is very effective, especially for correcting bad kernels. [Conformal Transformation of Kernel Functions: A Data-Dependent Way to Improve Support Vector Machine Classi¢ers SI WU* and SHUN-ICHI AMARI]
A programming mechanism?
Support Vector Machines for Analog Circuit Performance Representation F. De Bernardinisyx M. I. Jordany A.Sangiovanni-Vincentelli

Possible hardware implementations

Nonlinear MEMS filter
Parametrically Excited MEMS-Based Filters Kimberly L. Turner Steven W. Shaw
Translinear Curcuits. "silicon sliderules"
...multiple-input translinear elements; such elements produce output currents that are proportional to the exponential of a weighted sum of their input voltages. We can implement the weighted voltage summations with either resistive or capacitive voltage dividers. We can obtain the required exponential voltage-to-current transformations from either bipolar transistors or subthreshold MOS transistors. The subthreshold floating-gate MOS transistor naturally implements the exponential-of-a-weighted-sum operation in a single device. [Analysis, Synthesis, and Implementation of Networks of Multiple-Input Translinear Elements, B. Minch] Cornell. Note: subthreshold MOS is 100mv range.
Field Programmable Learning Array
The FPLA is a mixed-signal counterpart to the all-digital Field-Programmable Gate Array in that it enables rapid prototyping of algorithms in hardware. Unlike the FPGA, the FPLA is targeted directly for machine learning by providing local, parallel, online analog learning using floating-gate MOS synapse transistors.[Field-Programmable Learning Arrays Seth Bridges, Miguel Figueroa, David Hsu, and Chris Diorio]


Just a list of areas...

Music and Speech Synthesis
...investigate the modeling of musical and speech signals and demonstrate that the model may be used for synthesis of musical and speech signals.[Neural Network Modeling of Speech and Music Signals Axel R¨obel]


Nanocomputing with Delays José A. B. Fortes

In this paper, we develop a spatio-temporal memory that blends properties from long and short-term memory and is motivated by reaction diffusion mechanisms. The winning processing element of a self-organizing network creates traveling waves on the output space that gradually attenuate over time and space to diffuse temporal information and create localized spatio-temporal neighborhoods for clustering. The novelty of the model is in the creation of time varying Voronoi tessellations anticipating the learned input signal dynamics even when the cluster centers are fixed. We test the method in a robot navigation task and in vector quantization of speech. This method performs better than conventional static vector quantizers based on the same data set and similar training conditions. [Principles and networks for self-organization in space-time; Jose Principe Neil Euliano Shayan Garani ]


Another recursive neural network with (fading?) memory: Jose C. Principe, James Kuo, Samel Celebi, "An analysis of the gamma memory in dynamic neural networks," Trans. on Neural Networks, Vol. 5, No. 2, pp. 331-337, Mar. 1994. (pdf)