References[1] M. Stone, Cross-validatory choice and assessment of statistical predictions, J.R. Stat. Soc. Ser. B 36 (2) (1974) 117–147.[2] R.H. Myers, Classical and Modern Regression with Applications, 2nd ed., PWS-KENT, Boston, 1990.[3] S. Chen, S.A. Billings, W. Luo, Orthogonal least squares methods and theirapplications to non-linear system identification, Int. J. Control 50 (5) (1989)1873–1896.[4] M.J. Korenberg, Identifying nonlinear difference equation and functionalexpansion representations: the fast orthogonal algorithm, Ann. Biomed. Eng.16 (1) (1988) 123–142.[5] S. Chen, C.F.N. Cowan, P.M. Grant, Orthogonal least squares learning algo-rithm for radial basis function networks, IEEE Trans. Neural Netw. 2 (2) (1991)302–309.[6] L.-X. Wang, J.M. Mendel, Fuzzy basis functions, universal approximation, andorthogonal least-squares learning, IEEE Trans. Neural Netw. 5 (5) (1992)807–814.[7] X. Hong, C.J. Harris, Neurofuzzy design and model construction of nonlineardynamical processes from data, IEE Proc. Control Theory Appl. 148 (6) (2001)530–538.[8] Q. Zhang, Using wavelets network in nonparametric estimation, IEEE Trans.Neural Netw. 8 (2) (1997) 227–236.[9] S.A. Billings, H.L. Wei, The wavelet-NARMAX representation: a hybrid modelstructure combining polynomial models with multiresolution wavelet decom-positions, Int. J. Syst. Sci. 36 (3) (2005) 137–152.[10] N. Chiras, C. Evans, D. Rees, Nonlinear gas turbine modeling using NARMAXstructures, IEEE Trans. Instrum. Meas. 50 (4) (2001) 893–898.[11] Y. Gao, M.J. Er, Online adaptive fuzzy neural identification and control ofa class of MIMO nonlinear systems, IEEE Trans. Fuzzy Syst. 11 (4) (2003)462–477.[12] K.M. Tsang, W.L. Chan, Adaptive control of power factor correction converterusing nonlinear system identification, IEE Proc. Electr. Power Appl. 152 (3)(2005) 627–633.[13] G.-C. Luh, W.-C. Cheng, Identification of immune models for fault detection,Proc. Inst. Mech. Eng. I: J. Syst. Control Eng. 218 (2004) 353–367.
[14] G.W. Chang, C. Chen, Y. Liu, A neural-network-based method of modeling elec-tric arc furnace load for power engineering study, IEEE Trans. Power Syst. 25(1) (2010) 138–146.[15] B. Mutnury, M. Swaminathan, J.P. Libous, Macromodeling of nonlinear digitalI/O drivers, IEEE Trans. Adv. Pack. 29 (1) (2006) 102–113.[16] C. Huang, F. Wang, An RBF network with OLS and EPSO algorithms for real-timepower dispatch, IEEE Trans. Power Syst. 22 (1) (2007) 96–104.[17] R. Mukai, V.A. Vilnrotter, P. Arabshahi, V. Jammejad, Adaptive acquisition andtracking for deep space array feed antennas, IEEE Trans. Neural Netw. 13 (5)(2002) 1149–1162.[18] V.S. Kodogiannis, J.N. Lygouras, A. Tarczynski, H.S. Chowdrey, Artificial odordiscrimination system using electronic nose and neural networks for the iden-tification of urinary tract infection, IEEE Trans. Inf. Technol. Biomed. 12 (6)(2008) 707–713.[19] C. Kauffmann, P. Motreff, L. Sarry, In vivo supervised analysis of stent reen-dothelialization from optical coherence tomography, IEEE Trans. Med. Imaging29 (3) (2010) 807–818.[20] G.P. Asner, R.E. Martin, R. Tupayachi, et al., Taxonomy and remote sensing ofleaf mass per area (LMA) in humid tropical forests, Ecol. Appl. 21 (1) (2011)85–98.[21] M.J.L. Orr, Regularization in the selection of radial basis function centers, NeuralComput. 7 (3) (1995) 606–623.[22] S. Chen, E.S. Chng, K. Alkadhimi, Regularised orthogonal least squares algorithmfor constructing radial basis function networks, Int. J. Control 64 (5) (1996)829–837.[23] S. Chen, Y. Wu, B.L. Luk, Combined genetic algorithm optimization and regular-ized orthogonal least squares learning for radial basis function networks, IEEETrans. Neural Netw. 10 (5) (1999) 1239–1243.[24] M.E. Tipping, Sparse Bayesian learning and the relevance vector machine, J.Mach. Learn. Res. 1 (2001) 211–244.[25] S. Chen, X. Hong, C.J. Harris, Sparse kernel regression modelling using combinedlocally regularised orthogonal least squares and D-optimality experimentaldesign, IEEE Trans. Autom. Control 48 (6) (2003) 1029–1036.
[26] S. Chen, X. Hong, C.J. Harris, P.M. Sharkey, Sparse modelling using forwardregression with PRESS statistic and regularization, IEEE Trans. Syst. Man Cybern.B 34 (2) (2004) 898–911.[27] S. Chen, Local regularization assisted orthogonal least squares regression, Neu-rocomputing 69 (4-6) (2006) 559–585.[28] D.J.C. MacKay, Bayesian Methods for Adaptive Models (Ph.D. Thesis), CaliforniaInstitute of Technology, USA, 1992.[29] C.E. Shannon, A mathematical theory of information, Bell Syst. Tech. J. 27 (1948)379–423.[30] T.M. Cover, J.A. Thomas, Elements of Information Theory, John Wiley & Sons,1991.[31] G.L. Zheng, S.A. Billings, Radial basis function network configuration usingmutual information and the orthogonal least squares algorithm, Neural Netw.9 (9) (1996) 1619–1637.[32] F. Rossi, A. Lendasse, D. Franc¸ ois, V. Wertz, M. Verleysen, Mutual informationfor the selection of relevant variables in spectrometric nonlinear modelling,Chemom. Intell. Lab. Syst. 80 (2) (2006) 215–226.[33] H. Peng, F. Long, C. Ding, Feature selection based on mutual information criteriaof max-dependency, max-relevance and min-redundancy, IEEE Trans. PatternAnal. Mach. Intell. 27 (8) (2005) 1226–1238.[34] J.P.W. Pluim, J.B.A. Maintz, M.A. Viergever, Mutual information based regis-tration of medical images: a survey, IEEE Trans. Med. Imaging 22 (8) (2003)986–1004.[35] X. Zhou, X. Wang, E.R. Dougherty, Nonlinear probit gene classification usingmutual information and wavelet-based feature selection, J. Biol. Syst. 12 (3)(2004) 371–386.[36] X. Hong, S. Chen, C.J. Harris, A fast kernel classifier construction algorithm usingorthogonal forward selection to minimize leave-one-out misclassification rate,Int. J. Syst. Sci. 39 (2) (2008) 119–125.[37] X. Hong, S. Chen, C.J. Harris, A kernel-based two class classifier for imbalanceddata sets, IEEE Trans. Neural Netw. 18 (1) (2007) 28–41.[38] X. Hong, P.M. Sharkey, K. Warwick, Automatic nonlinear predictive model con-struction algorithm using forward regression and the PRESS statistic, IEE Proc.Control Theory Appl. 150 (3) (2003) 245–254.[39] B.D. Ripley, Pattern Recognition and Neural Networks, Cambridge UniversityPress, Cambridge, 1996.[40] G. Rätsch, T. Onoda, K.R. Müller, Soft margins for AdaBoost, Mach. Learn. 42 (3)(2001) 287–320.[41] G. Rätsch, http://www.raetschlab.org/members/raetsch