Sparse density estimation on the multinomial manifold

Tools

Lists

Hong, X. ORCID: https://orcid.org/0000-0002-6832-2298, Gao, J., Chen, S. and Zia, T. (2015) Sparse density estimation on the multinomial manifold. IEEE Transactions on Neural Networks and Learning Systems, 26 (11). pp. 2972-2977. ISSN 2162-237X

Preview

Text - Accepted Version
· Please see our End User Agreement before downloading.
728kB

It is advisable to refer to the publisher's version if you intend to cite from this work. See Guidance on citing.

To link to this item DOI: 10.1109/TNNLS.2015.2389273

Abstract/Summary

A new sparse kernel density estimator is introduced based on the minimum integrated square error criterion for the finite mixture model. Since the constraint on the mixing coefficients of the finite mixture model is on the multinomial manifold, we use the well-known Riemannian trust-region (RTR) algorithm for solving this problem. The first- and second-order Riemannian geometry of the multinomial manifold are derived and utilized in the RTR algorithm. Numerical examples are employed to demonstrate that the proposed approach is effective in constructing sparse kernel density estimators with an accuracy competitive with those of existing kernel density estimators.

Item Type:	Article
Refereed:	Yes
Divisions:	Science > School of Mathematical, Physical and Computational Sciences > Department of Computer Science
ID Code:	39718
Uncontrolled Keywords:	Minimum integrated square error (MISE), multinomial manifold, probability density function (pdf), sparse modeling.
Publisher:	IEEE Computational Intelligence Society
Publisher Statement:	© 2015 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

Download Statistics

Downloads

Downloads per month over past year

Altmetric

Deposit Details

References

[1] G. McLachlan and D. Peel, Finite Mixture Models. New York, NY, USA: Wiley, 2000. [2] B. W. Silverman, Density Estimation for Statistics and Data Analysis. London, U.K.: Chapman & Hall, 1986. [3] L. Rutkowski, “Adaptive probabilistic neural networks for pattern classification in time-varying environment,” IEEE Trans. Neural Netw., vol. 15, no. 4, pp. 811–827, Jul. 2004. [4] H. Yin and N. M. Allinson, “Self-organizing mixture networks for probability density estimation,” IEEE Trans. Neural Netw., vol. 12, no. 2, pp. 405–411, Mar. 2001. [5] Z. Halbe, M. Bortman, and M. Aladjem, “Regularized mixture density estimation with an analytical setting of shrinkage intensities,” IEEE Trans. Neural Netw. Learn. Syst., vol. 24, no. 3, pp. 460–473, Mar. 2013. [6] K. Zhang and J. T. Kwok, “Simplifying mixture models through function approximation,” IEEE Trans. Neural Netw., vol. 21, no. 4, pp. 644–658, Apr. 2010. [7] A. P. Dempster, N. M. Laird, and D. B. Rubin, “Maximum likelihood from incomplete data via the EM algorithm,” J. Roy. Statist. Soc. B (Methodological), vol. 39, no. 1, pp. 1–38, 1977. [8] J. A. Bilmes, “A gentle tutorial of the EM algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models,” Dept. Elect. Eng. Comput. Sci., Univ. California, Berkeley, Berkeley, CA, USA, Tech. Rep. ICSI-TR-97-021, 1998. [9] B. Efron and R. J. Tibshirani, An Introduction to the Bootstrap. London, U.K.: Chapman & Hall, 1993. [10] Z. R. Yang and S. Chen, “Robust maximum likelihood training of heteroscedastic probabilistic neural networks,” Neural Netw., vol. 11, no. 4, pp. 739–747, Jun. 1998. [11] E. Parzen, “On estimation of a probability density function and mode,” Ann. Math. Statist., vol. 33, no. 3, pp. 1066–1076, Sep. 1962. [12] J. Weston, A. Gammerman, M. O. Stitson, V. Vapnik, V. Vovk, and C. Watkins, “Support vector density estimation,” in Advances in Kernel Methods: Support Vector Learning, B. Schölkopf, C. J. C. Burges, and A. J. Smola, Eds. Cambridge, MA, USA: MIT Pres, 1999, pp. 293–306. [13] V. N. Vapnik and S. Mukherjee, “Support vector method for multivariate density estimation,” in Advances in Neural Information Processing Systems, S. A. Solla, T. K. Leen, and K. Müller, Eds. Cambridge, MA, USA: MIT Press, 2000, pp. 659–665. [14] A. Choudhury, “Fast machine learning algorithms for large data,” Ph.D. dissertation, School Eng. Sci., Univ. Southampton, Southampton, U.K., 2002. [15] S. Chen, S. A. Billings, and W. Luo, “Orthogonal least squares methods and their application to non-linear system identification,” Int. J. Control, vol. 50, no. 5, pp. 1873–1896, 1989. [16] X. Hong, P. M. Sharkey, and K. Warwick, “Automatic nonlinear predictive model-construction algorithm using forward regression and the PRESS statistic,” IEE Proc.-Control Theory Appl., vol. 150, no. 3, pp. 245–254, May 2003. [17] S. Chen, X. Hong, C. J. Harris, and P. M. Sharkey, “Sparse modeling using orthogonal forward regression with PRESS statistic and regularization,” IEEE Trans. Syst., Man, Cybern. B, Cybern., vol. 34, no. 2, pp. 898–911, Apr. 2004. [18] S. Chen, X. Hong, and C. J. Harris, “Sparse kernel density construction using orthogonal forward regression with leave-one-out test score and local regularization,” IEEE Trans. Syst., Man, Cybern. B, Cybern., vol. 34, no. 4, pp. 1708–1717, Aug. 2004. [19] S. Chen, X. Hong, and C. J. Harris, “An orthogonal forward regression technique for sparse kernel density estimation,” Neurocomputing, vol. 71, nos. 4–6, pp. 931–943, Jan. 2008. [20] M. Girolami and C. He, “Probability density estimation from optimally condensed data samples,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 25, no. 10, pp. 1253–1264, Oct. 2003. [21] D. W. Scott, “Parametric statistical modeling by minimum integrated square error,” Technometrics, vol. 43, no. 3, pp. 274–285, Aug. 2001. [22] X. Hong and C. J. Harris, “A mixture of experts network structure construction algorithm for modelling and control,” Appl. Intell., vol. 16, no. 1, pp. 59–69, 2002. [23] X. Hong, S. Chen, A. Qatawneh, K. Daqrouq, M. Sheikh, and A. Morfeq, “Sparse probability density function estimation using the minimum integrated square error,” Neurocomputing, vol. 115, pp. 122–129, Sep. 2013. [24] P.-A. Absil, R. Mahony, and R. Sepulchre, Optimization Algorithms on Matrix Manifolds. Princeton, NJ, USA: Princeton Univ. Press, 2008. [25] B.Mishra, G.Meyer, F. Bach, and R. Sepulchre, “Low-rank optimization with trace norm penalty,” SIAM J. Optim., vol. 23, no. 4, pp. 2124–2149, 2013. [26] M. Harandi, R. Hartley, C. Shen, B. Lovell, and C. Sanderson. (2014). “Extrinsic methods for coding and dictionary learning on Grassmann manifolds.” [Online]. Available: http://arxiv.org/abs/1401.8126 [27] Y. M. Lui, “Advances in matrix manifolds for computer vision,” Image Vis. Comput., vol. 30, nos. 6–7, pp. 380–388, 2012. [28] J. Weston, A. Ellisseeff, B. Schölkopf, and M. Tipping, “Use of the zero norm with linear models and kernel methods,” J. Mach. Learn. Res., vol. 3, no. 3, pp. 1439–1461, 2003. [29] X. Hong, S. Chen, and C. J. Harris, “Using zero-norm constraint for sparse probability density function estimation,” Int. J. Syst. Sci., vol. 43, no. 11, pp. 2107–2113, 2012. [30] R. Inokuchi and S. Miyamoto, “c-means clustering on the multinomial manifold,” in Modeling Decisions for Artificial Intelligence (Lecture Notes in Computer Science), vol. 4617. Berlin, Germany: Springer-Verlag, 2007, pp. 261–268. [31] Y. Sun, J. Gao, X. Hong, B. Mishra, and B. Yin, “Heterogeneous tensor decomposition for clustering via manifold optimization,” to be published. [32] B. Vandereycken, “Riemannian and multilevel optimization for rankconstrained matrix problems,” Ph.D. dissertation, Faculty Eng., Katholieke Univ. Leuven, Leuven, Belgium, 2010. [33] C. G. Baker, “Riemannian manifold trust-region methods with applications to eigenproblems,” Ph.D. dissertation, School Comput. Sci., Florida State Univ., Tallahassee, FL, USA, 2008. [34] B. Misha and R. Sepulchre. (2014). “Riemannian preconditioning.” [Online]. Available: http://arxiv.org/abs/1405.6055 [35] N. Boumal, B. Mishra, P.-A. Absil, and R. Sepulchre. (2013). “Manopt, a MATLAB toolbox for optimization on manifolds.” [Online]. Available: http://arxiv.org/abs/1308.5200 [36] B. D. Ripley, Pattern Recognition and Neural Networks. Cambridge, MA, USA: Cambridge Univ. Press, 1996. [37] M. E. Tipping, “Sparse Bayesian learning and the relevance vector machine,” J. Mach. Learn. Res., vol. 1, pp. 211–244, Jun. 2001. [38] K. Bache and M. Lichman. (2013). “UCI machine learning repository,” School Inf. Comput. Sci., Univ. California, Irvine, Irvine, CA, USA. [Online]. Available: http://archive.ics.uci.edu/ml

University Staff: Request a correction | Centaur Editors: Update this record

University of Reading

CentAUR: Central Archive at the University of Reading

Accessibility navigation

Sparse density estimation on the multinomial manifold

Abstract/Summary

Downloads

Page navigation

See also

Footer navigation