[1] J. Weston, A. Ellisseeff, B. Sch¨olkopf and M. Tipping, “Use of
the zero-norm with linear models and kernel methods,” J. Machine
Learning Research, vol.3, pp.1439–1461, 2003.
[2] P.S. Bradley and O.L. Mangasarian, “Feature selection via concave
minimization and support vector machines,” in Proc. 13th ICML (San
Francisco, CA, USA), 1998, pp.82–90.
[3] R.O. Duda and P.E. Hart, Pattern Classification and Scene Analysis.
New York: Wiley, 1973.
[4] C.M. Bishop, Neural Networks for Pattern Recognition. Oxford, UK:
Oxford University Press, 1995.
[5] B.W. Silverman, Density Estimation. London: Chapman and Hall,
1996.
[6] H. Wang, “Robust control of the output probability density functions
for multivariable stochastic systems with guaranteed stability,” IEEE
Trans. Automatic Control, vol.44, no.11, pp.2103–2107, 1999.
[7] S. Chen, A.K. Samingan, B. Mulgrew and L. Hanzo, “Adaptive
minimum-BER linear multiuser detection for DS-CDMA signals in
multipath channels,” IEEE Trans. Signal Processing, vol.49, no.6,
pp.1240–1247, 2001.
[8] E. Parzen, “On estimation of a probability density function and mode,”
The Annals of Mathematical Statistics, vol.33, pp.1066–1076, 1962.
[9] G. McLachlan and D. Peel, Finite Mixture Models. New York: John
Wiley, 2000.
[10] J.A. Bilmes, “A gentle tutorial of the EM algorithm and its application
to parameter estimation for Gaussian Mixture and hidden Markov
models,” Technical Report, ICSI-TR-97-021, University of Berkeley,
1997.
[11] J. Weston, A. Gammerman. M.O. Stitson, V. Vapnik, V. Vovk and
C. Watkins, “Support vector density estimation,” in: B. Sch¨olkopf, C.
Burges and A.J. Smola, eds., Advances in Kernel Methods — Support
Vector Learning, MIT Press, Cambridge MA, 1999, pp.293–306.
[12] V. Vapnik and S. Mukherjee, “Support vector method for multivariate
density estimation,” in: S. Solla, T. Leen and K.R. M¨uller, eds.,
Advances in Neural Information Processing Systems, MIT Press, 2000,
pp.659–665.
[13] M. Girolami and C. He, “Probability density estimation from optimally
condensed data samples,” IEEE Trans. Pattern Analysis and Machine
Intelligence, vol.25, no.10, pp.1253–1264, 2003.
[14] A. Choudhury, Fast Machine Learning Algorithms for Large Data.
PhD Thesis, Computational Engineering and Design Center, School
of Engineering Sciences, University of Southampton, 2002.
[15] X. Hong, P.M. Sharkey and K. Warwick, “Automatic nonlinear predictive
model construction algorithm using forward regression and the
PRESS statistic,” IEE Proc. Control Theory and Applications, vol.150,
no.3, pp.245–254, 2003.
[16] S. Chen, X. Hong, C.J. Harris and P.M. Sharkey, “Sparse modeling
using orthogonal forward regression with PRESS statistic and regularization,”
IEEE Trans. Systems, Man and Cybernetics, Part B, vol.34,
no.2, pp.898–911, 2004.
[17] S. Chen, X. Hong and C.J. Harris, “Sparse kernel density construction
using orthogonal forward regression with leave-one-out test score and
local regularization,” IEEE Trans. Systems, Man and Cybernetics, Part
B, vol.34, no.4, pp.1708–1717, 2004.
[18] S. Chen, X. Hong and C.J. Harris, “An orthogonal forward regression
techniques for sparse kernel density estimation,” Neurocomputing,
vol.71, no.4-6, pp.931–943, 2008.
[19] F. Sha, L.K. Saul and D.D. Lee, “Multiplicative updates for nonnegative
quadratic programming in support vector machines,” Technical
Report. MS-CIS-02-19, University of Pennsylvania, USA, 2002.
[20] S. Chen, X. Hong and C.J. Harris, “Sparse kernel density estimator using
orthogonal regression based on D-optimality experimental design,”
in Proc. IJCNN 2008 (Hong Kong, China), June 1-6, 2008, pp.1–6.
[21] E. Amaldi and V. Kann, “On the approximability of minimizing
nonzero variables or unsatisfied relations in linear systems,” Theoretical
Computer Science, vol.209, pp.237–260, 1998.
[22] A.C. Atkinson and A.N. Donev, Optimum Experimental Designs.
Oxford, U.K.: Clarendon Press, 1992.
[23] S. Chen and J. Wigger, “Fast orthogonal least squares algorithm
for efficient subset model selection,” IEEE Trans. Signal Processing,
vol.43, no.7, pp.1713–1715, 1995.