[1] G. McLachlan and D. Peel, Finite Mixture Models. New York, NY, USA:
Wiley, 2000.
[2] B. W. Silverman, Density Estimation for Statistics and Data Analysis.
London, U.K.: Chapman & Hall, 1986.
[3] L. Rutkowski, “Adaptive probabilistic neural networks for pattern classification
in time-varying environment,” IEEE Trans. Neural Netw.,
vol. 15, no. 4, pp. 811–827, Jul. 2004.
[4] H. Yin and N. M. Allinson, “Self-organizing mixture networks for
probability density estimation,” IEEE Trans. Neural Netw., vol. 12, no. 2,
pp. 405–411, Mar. 2001.
[5] Z. Halbe, M. Bortman, and M. Aladjem, “Regularized mixture density
estimation with an analytical setting of shrinkage intensities,” IEEE
Trans. Neural Netw. Learn. Syst., vol. 24, no. 3, pp. 460–473, Mar. 2013.
[6] K. Zhang and J. T. Kwok, “Simplifying mixture models through function
approximation,” IEEE Trans. Neural Netw., vol. 21, no. 4, pp. 644–658,
Apr. 2010.
[7] A. P. Dempster, N. M. Laird, and D. B. Rubin, “Maximum likelihood
from incomplete data via the EM algorithm,” J. Roy. Statist.
Soc. B (Methodological), vol. 39, no. 1, pp. 1–38, 1977.
[8] J. A. Bilmes, “A gentle tutorial of the EM algorithm and its application to
parameter estimation for Gaussian mixture and hidden Markov models,”
Dept. Elect. Eng. Comput. Sci., Univ. California, Berkeley, Berkeley,
CA, USA, Tech. Rep. ICSI-TR-97-021, 1998.
[9] B. Efron and R. J. Tibshirani, An Introduction to the Bootstrap. London,
U.K.: Chapman & Hall, 1993.
[10] Z. R. Yang and S. Chen, “Robust maximum likelihood training of
heteroscedastic probabilistic neural networks,” Neural Netw., vol. 11,
no. 4, pp. 739–747, Jun. 1998.
[11] E. Parzen, “On estimation of a probability density function and mode,”
Ann. Math. Statist., vol. 33, no. 3, pp. 1066–1076, Sep. 1962.
[12] J. Weston, A. Gammerman, M. O. Stitson, V. Vapnik, V. Vovk, and
C. Watkins, “Support vector density estimation,” in Advances in Kernel
Methods: Support Vector Learning, B. Schölkopf, C. J. C. Burges, and
A. J. Smola, Eds. Cambridge, MA, USA: MIT Pres, 1999, pp. 293–306.
[13] V. N. Vapnik and S. Mukherjee, “Support vector method for multivariate
density estimation,” in Advances in Neural Information Processing
Systems, S. A. Solla, T. K. Leen, and K. Müller, Eds. Cambridge, MA,
USA: MIT Press, 2000, pp. 659–665.
[14] A. Choudhury, “Fast machine learning algorithms for large data,”
Ph.D. dissertation, School Eng. Sci., Univ. Southampton, Southampton,
U.K., 2002.
[15] S. Chen, S. A. Billings, and W. Luo, “Orthogonal least squares methods
and their application to non-linear system identification,” Int. J. Control,
vol. 50, no. 5, pp. 1873–1896, 1989.
[16] X. Hong, P. M. Sharkey, and K. Warwick, “Automatic nonlinear predictive
model-construction algorithm using forward regression and the
PRESS statistic,” IEE Proc.-Control Theory Appl., vol. 150, no. 3,
pp. 245–254, May 2003.
[17] S. Chen, X. Hong, C. J. Harris, and P. M. Sharkey, “Sparse modeling
using orthogonal forward regression with PRESS statistic and regularization,”
IEEE Trans. Syst., Man, Cybern. B, Cybern., vol. 34, no. 2,
pp. 898–911, Apr. 2004.
[18] S. Chen, X. Hong, and C. J. Harris, “Sparse kernel density construction
using orthogonal forward regression with leave-one-out test score and
local regularization,” IEEE Trans. Syst., Man, Cybern. B, Cybern.,
vol. 34, no. 4, pp. 1708–1717, Aug. 2004.
[19] S. Chen, X. Hong, and C. J. Harris, “An orthogonal forward regression
technique for sparse kernel density estimation,” Neurocomputing, vol. 71,
nos. 4–6, pp. 931–943, Jan. 2008.
[20] M. Girolami and C. He, “Probability density estimation from optimally
condensed data samples,” IEEE Trans. Pattern Anal. Mach. Intell.,
vol. 25, no. 10, pp. 1253–1264, Oct. 2003.
[21] D. W. Scott, “Parametric statistical modeling by minimum integrated
square error,” Technometrics, vol. 43, no. 3, pp. 274–285, Aug. 2001.
[22] X. Hong and C. J. Harris, “A mixture of experts network structure
construction algorithm for modelling and control,” Appl. Intell., vol. 16,
no. 1, pp. 59–69, 2002.
[23] X. Hong, S. Chen, A. Qatawneh, K. Daqrouq, M. Sheikh, and A. Morfeq,
“Sparse probability density function estimation using the minimum integrated
square error,” Neurocomputing, vol. 115, pp. 122–129, Sep. 2013.
[24] P.-A. Absil, R. Mahony, and R. Sepulchre, Optimization Algorithms
on Matrix Manifolds. Princeton, NJ, USA: Princeton Univ. Press,
2008.
[25] B.Mishra, G.Meyer, F. Bach, and R. Sepulchre, “Low-rank optimization
with trace norm penalty,” SIAM J. Optim., vol. 23, no. 4, pp. 2124–2149,
2013.
[26] M. Harandi, R. Hartley, C. Shen, B. Lovell, and C. Sanderson. (2014).
“Extrinsic methods for coding and dictionary learning on Grassmann
manifolds.” [Online]. Available: http://arxiv.org/abs/1401.8126
[27] Y. M. Lui, “Advances in matrix manifolds for computer vision,” Image
Vis. Comput., vol. 30, nos. 6–7, pp. 380–388, 2012.
[28] J. Weston, A. Ellisseeff, B. Schölkopf, and M. Tipping, “Use of the
zero norm with linear models and kernel methods,” J. Mach. Learn.
Res., vol. 3, no. 3, pp. 1439–1461, 2003.
[29] X. Hong, S. Chen, and C. J. Harris, “Using zero-norm constraint for
sparse probability density function estimation,” Int. J. Syst. Sci., vol. 43,
no. 11, pp. 2107–2113, 2012.
[30] R. Inokuchi and S. Miyamoto, “c-means clustering on the multinomial
manifold,” in Modeling Decisions for Artificial Intelligence
(Lecture Notes in Computer Science), vol. 4617. Berlin, Germany:
Springer-Verlag, 2007, pp. 261–268.
[31] Y. Sun, J. Gao, X. Hong, B. Mishra, and B. Yin, “Heterogeneous
tensor decomposition for clustering via manifold optimization,”
to be published.
[32] B. Vandereycken, “Riemannian and multilevel optimization for rankconstrained
matrix problems,” Ph.D. dissertation, Faculty Eng.,
Katholieke Univ. Leuven, Leuven, Belgium, 2010.
[33] C. G. Baker, “Riemannian manifold trust-region methods with applications
to eigenproblems,” Ph.D. dissertation, School Comput. Sci.,
Florida State Univ., Tallahassee, FL, USA, 2008.
[34] B. Misha and R. Sepulchre. (2014). “Riemannian preconditioning.”
[Online]. Available: http://arxiv.org/abs/1405.6055
[35] N. Boumal, B. Mishra, P.-A. Absil, and R. Sepulchre. (2013). “Manopt,
a MATLAB toolbox for optimization on manifolds.” [Online]. Available:
http://arxiv.org/abs/1308.5200
[36] B. D. Ripley, Pattern Recognition and Neural Networks. Cambridge,
MA, USA: Cambridge Univ. Press, 1996.
[37] M. E. Tipping, “Sparse Bayesian learning and the relevance vector
machine,” J. Mach. Learn. Res., vol. 1, pp. 211–244, Jun. 2001.
[38] K. Bache and M. Lichman. (2013). “UCI machine learning repository,”
School Inf. Comput. Sci., Univ. California, Irvine, Irvine, CA, USA.
[Online]. Available: http://archive.ics.uci.edu/ml