[1] G. J. McLachlan and D. Peel, Finite Mixture Models. Wiley: New York,
2000.
[2] B. W. Silverman, Density Estimation for Statistics and Data Analysis.
Chapman and Hall: London, 1986.
[3] R. O. Duda and P. E. Hart, Pattern Classification and Scene Analysis.
Wiley: New York, 1973.
[4] C. M. Bishop, Neural Networks for Pattern Recognition. Oxford University
Press: Oxford, 1995.
[5] S. Chen, X. Hong, and C J. Harris, “Particle swarm optimization aided
orthogonal forward regression for unified data modelling,” IEEE Trans.
Evolutionary Computation, vol. 14, no. 4, pp. 477–499, Aug. 2010.
[6] E. Parzen, “On estimation of a probability density function and mode,”
Annals of Mathematical Statistics, vol. 33, no. 3, pp. 1066–1076, Sept.
1962.
[7] A. P. Dempster, N. M. Laird, and D. B. Rubin, “Maximum likelihood
from incomplete data via the EM algorithm,” J. Royal Statistical Society
B, vol. 39, no. 1, pp. 1–38, 1977.
[8] J. A. Bilmes, “A gentle tutorial of the EM algorithm and its application to
parameter estimation for Gaussian mixture and hidden Markov models,”
Technical Report ICSI-TR-97-021, University of California, Berkeley,
1998.
[9] B. Efron and R. J. Tibshirani, An Introduction to Bootstrap. Chapman
& Hall: London, 1993.
[10] Z. R. Yang and S. Chen, “Robust maximum likelihood training of
heteroscedastic probabilistic neural networks,” Neural Networks, vol. 11,
no. 4, pp. 739–747, June 1998.
[11] J. Weston, A. Gammerman, M. O. Stitson, V. Vapnik, V. Vovk, and
C. Watkins, “Support vector density estimation,” in Advances in Kernel
Methods – Support Vector Learning, B. Sch¨olkopf, C. Burges, and
A. J. Smola, Eds. MIT Pres: Cambridge, MA, 1999, pp. 293–306.
[12] V. Vapnik and S. Mukherjee, “Support vector method for multivariate
density estimation,” in Advances in Neural Information Processing Systems,
S. Solla, T. Leen, and K. R. M¨uller, Eds. MIT Press: Cambridge,
MA, 2000, pp. 659–665.
[13] A. Choudhury, Fast Machine Learning Algorithms for Large Data, Ph.D.
dissertation, School of Engineering Sciences, University of Southampton,
2002.
[14] S. Chen, S. A. Billings, and W. Luo, “Orthogonal least squares methods
and their applications to non-linear system identification,” Int. J. Control,
vol. 50, no. 5, pp. 1873–1896, 1989.
[15] X. Hong, P. M. Sharkey, and K. Warwick, “Automatic nonlinear predictive
model construction algorithm using forward regression and the
PRESS statistic,” IEE Proc. Control Theory Applications, vol. 150, no. 3,
pp. 245–254, 2003.
[16] S. Chen, X. Hong, C. J. Harris, and P. M. Sharkey, “Sparse modelling
using forward regression with PRESS statistic and regularization,” IEEE
Trans. Systems, Man and Cybernetics, Part B, vol. 34, no. 2, pp. 898–
911, 2004.
[17] S. Chen, X. Hong, and C. J. Harris, “Sparse kernel density construction
using orthogonal forward regression with leave-one-out test score and
local regularization,” IEEE Trans. Systems, Man, and Cybernetics, Part
B, vol. 34, no. 4, pp. 1708–1717, Aug. 2004.
[18] S. Chen, X. Hong, and C. J. Harris, “An orthogonal forward regression
techniques for sparse kernel density estimation,” Neurocomputing,
vol. 71, nos. 46, pp. 931–943, Jan. 2008.
[19] M. Girolami and C. He, “Probability density estimation from optimally
condensed data samples,” IEEE Trans. Pattern Analysis and Machine
Intelligence, vol. 25, no. 10, pp. 1253–1264, Oct. 2003.
[20] S. W. Scott, “Parametric statistical modeling by minimum integrated
square error,” Technometrics, vol. 43, no. 3, pp. 274–285, Aug. 2001.
[21] X. Hong and C. J. Harris, “A mixture of experts network structure
construction algorithm for modelling and control,” Applied Intelligence,
vol. 16, no. 1, pp. 59–69, 2002.
[22] G. R�atsch, T. Onoda, and K. R. M�uller, “Soft margins for AdaBoost,”
Machine Learning, vol. 42, no. 3, pp. 287–320, 2001.
[23] G. R�atsch, “http://www.fml.tuebingen.mpg.de/members/raetsch/benchmark,”