[1] M. Stone, “Cross validatory choice and assessment of statistical predictions,”
J. Roy. Statist. Soc. Ser. B, vol. 36, no. 2, pp. 117–147, 1974.
[2] S. Chen, Y.Wu, and B. L. Luk, “Combined genetic algorithm optimization
and regularized orthogonal least squares learning for radial basis function
networks,” IEEE Trans. Neural Netw., vol. 10, no. 5, pp. 1239–1243,
Sep. 1999.
[3] M. J. L. Orr, “Regularization in the selection of radial basis function
centers,” Neural Comput., vol. 7, no. 3, pp. 606–623, May 1995.
[4] X. Hong and S. A. Billings, “Parameter estimation based on stacked
regression and evolutionary algorithms,” Proc. Inst. Elect. Eng.—Control
Theory Appl., vol. 146, no. 5, pp. 406–414, Sep. 1999.
[5] L. Ljung and T. Glad, Modelling of Dynamic Systems. Englewood Cliffs,
NJ: Prentice-Hall, 1994.
[6] H. Akaike, “A new look at the statistical model identification,” IEEE
Trans. Autom. Control, vol. AC-19, no. 6, pp. 716–723, Dec. 1974.
[7] V. Vapnik, The Nature of Statistical Learning Theory. New York:
Springer-Verlag, 1995.
[8] M. E. Tipping, “Sparse Bayesian learning and the relevance vector machine,”
J. Mach. Learn. Res., vol. 1, pp. 211–244, Sep. 2001.
[9] B. Scholkopf and A. J. Smola, Learning With Kernels: Support Vector
Machine, Regularization, Optimization and Beyond. Cambridge, MA:
MIT Press, 2002.
[10] X. Hong and C. J. Harris, “Nonlinear model structure design and construction
using orthogonal least squares and D-optimality design,” IEEE
Trans. Neural Netw., vol. 13, no. 5, pp. 1245–1250, Sep. 2002.
[11] S. Chen, S. A. Billings, and W. Luo, “Orthogonal least squares methods
and their applications to non-linear system identification,” Int. J. Control,
vol. 50, no. 5, pp. 1873–1896, 1989.
[12] K. Z. Mao, “RBF neural network center selection based on Fisher ratio
class separability measure,” IEEE Trans. Neural Netw., vol. 13, no. 5,
pp. 1211–1217, Sep. 2002.
[13] S. Chen, X. X. Wang, X. Hong, and C. J. Harris, “Kernel classifier
construction using orthogonal forward selection and boosting with Fisher
ratio class separability,” IEEE Trans. Neural Netw., vol. 17, no. 6,
pp. 1652–1656, Nov. 2004.
[14] X. Hong, S. Chen, and C. J. Harris, “A fast kernel classifier construction
algorithm using orthogonal forward selection to minimize leave-one-out
misclassification rate,” Int. J. Syst. Sci., vol. 39, no. 2, pp. 119–125, 2008.
[15] S. Chen, X. Hong, and C. J. Harris, “Particle swarm optimization aided
orthogonal forward regression for unified data modelling,” IEEE Trans.
Evol. Comput., vol. 14, no. 4, pp. 477–499, Aug. 2010.
[16] J. Kennedy and R. Eberhart, “Particle swarm optimization,” in Proc. IEEE
Int. Conf. Neural Netw., Perth, Australia, Nov. 27–Dec. 1, 1995, vol. 4,
pp. 1942–1948.
[17] J. Kennedy and R. C. Eberhart, Swarm Intelligence. Waltham, MA:
Morgan Kaufmann, 2001.
[18] D. W. van der Merwe and A. P. Engelbrecht, “Data clustering using
particle swarm optimization,” in Proc. CEC, Cabberra, Australia,
Dec. 8–12, 2003, pp. 215–220.
[19] A. Ratnaweera, S. K. Halgamuge, and H. C. Watson, “Self-organizing
hierarchical particle swarm optimizer with time-varying acceleration coefficients,”
IEEE Trans. Evol. Comput., vol. 8, no. 3, pp. 240–255,
Jun. 2004.
[20] M. G. H. Omran, “Particle swarm optimization methods for pattern recognition
and image processing,” Ph.D. dissertation, Univ. Pretoria, Pretoria,
South Africa, 2005.
[21] S. M. Guru, S. K. Halgamuge, and S. Fernando, “Particle swarm optimisers
for cluster formation in wireless sensor networks,” in Proc.
Int. Conf. Intell. Sens., Sens. Netw. Inf. Process., Melbourne, Australia,
Dec. 5–8, 2005, pp. 319–324.
[22] K. K. Soo, Y. M. Siu, W. S. Chan, L. Yang, and R. S. Chen, “Particleswarm-
optimization-based multiuser detector for CDMA communications,”
IEEE Trans. Veh. Technol., vol. 56, no. 5, pp. 3006–3013,
Sep. 2007.
[23] S. Chen, X. Hong, and C. J. Harris, “Sparse kernel regression modeling
using combined locally regularized orthogonal least squares and
D-optimality experimental design,” IEEE Trans. Autom. Control, vol. 48,
no. 6, pp. 1029–1036, Jun. 2003.
[24] H. Zou and T. Hastie, “Regularization and variable selection via the elastic
net,” J. Roy. Stastist. Soc. B, vol. 67, no. 2, pp. 301–320, 2005.
[25] S. Chen, “Locally regularised orthogonal least squares algorithm for the
construction of sparse kernel regression models,” in Proc. 6th Int. Cof.
Signal Process., Beijing, China, 2002, pp. 1229–1232.
[26] D. J. C. MacKay, “Bayesian Methods for Adaptive Models,” Ph.D. thesis,
California Inst. Technol., Pasadena, CA, 1991.
[27] S. S. Chen, D. L. Donoho, and M. A. Saunders, “Atomic decomposition
by basis pursuit,” SIAM J. Sci. Comput., vol. 43, no. 1, pp. 129–159, 1998.
[28] R. Tibshirani, “Regression shrinkage and selection via the lasso,” J. Roy.
Statist. Soc. Ser. B, vol. 58, no. 1, pp. 267–288, 1996.
[29] B. Efron, I. Johnstone, T. Hastie, and R. Tibshirani, “Least angle regression,”
Ann. Statist., vol. 32, pp. 407–451, 2004.
[30] S. Chen, “Local regularization assisted orthogonal least squares regression,”
Neurocomputing, vol. 69, no. 4–6, pp. 559–585, Jan. 2006.
[31] L. Ljung, System Identification: Theory for the User. Upper Saddle
River, NJ: Prentice-Hall, 1987.
[32] G. Rätsch, T. Onoda, and K. R. Müller, “Soft margins for AdaBoost,”
Mach. Learn., vol. 42, no. 3, pp. 287–320, Mar. 2001.
[33] G. Rätsch. [Online]. Available: http://www.fml.tuebingen.mpg.de/
members/raetsch/benchmark
[34] S. Chen and J. Wigger, “Fast orthogonal least squares algorithm for
efficient subset selection,” IEEE Trans. Signal Process., vol. 43, no. 7,
pp. 1713–1715, Jul. 1995.