## PDFOS: PDF estimation based over-sampling for imbalanced two-class problems
Gao, M., Hong, X., Chen, S., Harris, C. J. and Khalaf, E.
(2014)
Full text not archived in this repository. It is advisable to refer to the publisher's version if you intend to cite from this work. See Guidance on citing. To link to this item DOI: 10.1016/j.neucom.2014.02.006 ## Abstract/SummaryThis contribution proposes a novel probability density function (PDF) estimation based over-sampling (PDFOS) approach for two-class imbalanced classification problems. The classical Parzen-window kernel function is adopted to estimate the PDF of the positive class. Then according to the estimated PDF, synthetic instances are generated as the additional training data. The essential concept is to re-balance the class distribution of the original imbalanced data set under the principle that synthetic data sample follows the same statistical properties. Based on the over-sampled training data, the radial basis function (RBF) classifier is constructed by applying the orthogonal forward selection procedure, in which the classifier’s structure and the parameters of RBF kernels are determined using a particle swarm optimisation algorithm based on the criterion of minimising the leave-one-out misclassification rate. The effectiveness of the proposed PDFOS approach is demonstrated by the empirical study on several imbalanced data sets.
[1] N. Petrick, H. P. Chan, B. Sahiner, and D. Wei, “An adaptive densityweighted
contrast enhancement filter for mammographic breast mass
detection,” IEEE Transactions on Medical Imaging, vol. 15, no. 1,
pp. 59–67, 1996.
[2] T. Fawcett and F. Provost, “Adaptive fraud detection,” Data Mining and
Knowledge Discovery, vol. 1, no. 3, pp. 291–316, 1997.
[3] M. Kubat, R. C. Holte, and S. Matwin, “Machine learning for the
detection of oil spills in satellite radar images,” Machine Learning,
vol. 30, no. 2-3, pp. 195–215, 1998.
[4] D. D. Lewis and J. Catlett, “Heterogeneous uncertainty sampling for supervised
learning,” in Proceedings of the 11th International Conference
on Machine Learning (New Brunswick, NJ, USA), July 10-13, 1994,
pp. 148–156.
[5] C. X. Ling and C. Li, “Data mining for direct marketing: Problems
and solutions,” in Proceedings of the 4th International Conference on
Knowledge Discovery and Data Mining (New York, USA), August 27-
31, 1998, pp. 73–79.
[6] E. P. D. Pednault, B. K. Rosen, and C. Apte, “Handling imbalanced
data sets in insurance risk modeling,” IBM Research Report RC-21731,
2000.
[7] G. M. Weiss and F. Provost, “The effect of class distribution on classifier
learning: An empirical study,” Technical Report ML-TR-44, Department
of Computer Science, Rutgers University, 2001.
[8] A. Estabrooks, T. Jo, and N. Japkowicz, “A multiple resampling method
for learning from imbalanced data sets,” Journal of Chemical Information
and Modeling, vol. 20, no. 1, pp. 18–36, 2004.
[9] N. Japkowicz and S. Stephen, “The class imbalance problem: A systematic
study,” Intelligence Data Analysis, vol. 6, no. 5, pp. 429–449,
2002.
[10] R. Akbani, S. Kwek, and N. Japkowicz, “Applying support vector
machines to imbalanced datasets,” in Proceedings of the 15th European
Conference on Machine Learning (Pisa, Italy), Sept. 20-24, 2004,
pp. 39–50.
[11] G. Wu and E. Y. Chang, “KBA: kernel boundary alignment considering
imbalanced data distribution,” IEEE Transactions on Knowledge and
Data Engineering, vol. 17, no. 6, pp. 786–795, 2005.
[12] H. He and E. A. Garcia, “Learning from imbalanced data,” IEEE
Transactions on Knowledge and Data Engineering, vol. 21, no. 9,
pp. 1263–1284, 2009.
[13] X. Hong, S. Chen, and C. J. Harris, “A kernel-based two-class classifier
for imbalanced data sets,” IEEE Transactions on Neural Networks,
vol. 18, no. 1, pp. 28–41, 2007.
[14] J. Moody and C. J. Darken, “Fast learning in networks of locally-tuned
processing units,” Neural Computation, vol. 1, No. 2, pp. 281–294, 1989.
[15] S. Haykin, Neural Networks: A Comprehensive Foundation, 2nd Edition.
Upper Saddle River, NJ: Prentice Hall, 1998.
[16] Y. Sun, M. S. Kamel, A. K. C. Wong, and Y. Wang, “Cost-sensitive
boosting for classification of imbalanced data,” Pattern Recognition,
vol. 40, no. 12, pp. 3358–3378, 2007.
[17] W. Fan, S. J. Stolfo, J. Zhang, and P. K. Chan, “AdaCost: Misclassification
cost-sensitive boosting,” in Proceedings of the 16th International
Conference on Machine Learning (Bled, Slovenia), June 27-30, 1999,
pp. 97–105.
[18] J. Kennedy and R. C. Eberhart, Swarm Intelligence. Morgan Kaufmann,
2001.
[19] S. Chen, X. Hong, and C. J. Harris, “Radial basis function classifier
construction using particle swarm optimisation aided orthogonal forward
regression,” in Proceedings of the 2010 International Joint Conference
on Neural Networks (Barcelona, Spain), July 18-23, 2010, pp. 3418–
3423.
[20] S. Chen, X. Hong, and C. J. Harris, “Particle swarm optimization
aided orthogonal forward regression for unified data modelling,” IEEE
Transactions on Evolutionary Computation, vol. 14, no. 4, pp. 477–499,
2010.
[21] A. Ratnaweera, S. K. Halgamuge, and H. C. Watson, “Self-organizing
hierarchical particle swarm optimizer with time-varying acceleration
coefficients,” IEEE Transactions on Evolutionary Computation, vol. 8,
no. 3, pp. 240–255, 2004.
[22] W.-F. Leong and G. G. Yen, “PSO-based multiobjective optimization
with dynamic population size and adaptive local archives,” IEEE Transactions
on Systems, Man, and Cybernetics, Part B, vol. 38, no. 5,
pp. 1270 –1293, 2008.
[23] S. Chen, X. Hong, B. L. Luk, and C. J. Harris, “Non-linear system
identification using particle swarm optimisation tuned radial basis function
models,” International Journal of Bio-Inspired Computation, vol. 1,
no. 4, pp. 246–258, 2009.
[24] M. Ramezani, M.-R. Haghifam, C. Singh, H. Seifi, and M. P. Moghaddam,
“Determination of capacity benefit margin in multiarea power
systems using particle swarm optimization,” IEEE Transactions on
Power Systems, vol. 24, no. 2, pp. 631 –641, 2009.
[25] H.-L. Wei, S. A. Billings, Y. Zhao, and L. Guo, “Lattice dynamical
wavelet neural networks implemented using particle swarm optimization
for spatio-temporal system identification,” IEEE Transactions on Neural
Networks, vol. 20, no. 1, pp. 181 –185, 2009.
[26] S. Chen, W. Yao, H. R. Palally, and L. Hanzo, “Particle swarm optimisation
aided MIMO transceiver designs,” Chapter 19 in: Y. Tenne and
C.-K. Goh, Eds., Computational Intelligence in Expensive Optimization
Problems, Berlin: Springer-Verlag, 2010, pp. 487–511.
[27] P. Puranik, P. Bajaj, A. Abraham, P. Palsodkar, and A. Deshmukh, “Human
perception-based color image segmentation using comprehensive
learning particle swarm optimization,” Journal of Information Hiding
and Multimedia Signal Processing, vol. 2, no. 3, pp. 227–235, 2011.
[28] F.-C. Chang and H.-C. Huang, “A refactoring method for cache-efficient
swarm intelligence algorithms,” Information Sciences, vol. 192, no. 1,
pp. 39–49, 2012.
[29] G. E. A. P. A. Batista, R. C. Prati, and M. C. Monard, “A study of
the behavior of several methods for balancing machine learning training
data,” ACM SIGKDD Exploration Newsletter, vol. 6, no. 1, pp. 20–29,
2004.
[30] C. Drummond and R. C. Holte, “C4.5, class imbalance, and cost
sensitivity: Why under-sampling beats over-sampling,” in Proceedings
of the 12th International Conference on Machine Learning – Workshop
on Learning from Imbalanced Datasets II (Washington DC, USA), Aug.
21, 2003, pp. 1–8.
[31] D. W. Aha, D. Kibler, and M. K. Albert, “Instance-based learning
algorithms,” Machine Learning, vol. 6, no. 1, pp. 37–66, 1991.
[32] J. Zhang, “Selecting typical instances in instance-based learning,” in
Proceedings of the 9th International Workshop on Machine learning
(Aberdeen, Scotland), July 1-3, 1992, pp. 470–479.
[33] D. B. Skalak, “Prototype and feature selection by sampling and random
mutation hill climbing algorithms,” in Proceedings of the 11th International
Conference on Machine Learning (New Brunswick, USA), July
10-13, 1994, pp. 293–301.
[34] S. Floyd and M. Warmuth, “Sample compression, learnability, and the
vapnik-chervonenkis dimension.” Machine Learning, vol. 21, no. 3,
pp. 269–304, 1995.
[35] M. Kubat and S. Matwin, “Addressing the curse of imbalanced training
sets: One-sided selection,” in Proceedings of the 14th International
Conference on Machine Learning (Nashville, USA), July 8-12, 1997,
pp. 179–186.
[36] J. Zhang and I. Mani, “KNN approach to unbalance data distributions: A
case study involving information extraction,” in Proceedings of the 12th
International Conference on Machine Learning – Workshop on Learning
from Imbalanced Datasets II (Washington DC, USA), Aug. 21, 2003,
pp. 42–48.
[37] X. Y. Liu, J. Wu, and Z. H. Zhou, “Exploratory undersampling for
class-imbalance learning,” IEEE Transactions on Systems, Man, and
Cybernetics, Part B, vol. 39, no. 2, pp. 539–550, 2009.
[38] R. Barandela, E. Rangel, J. S. S´anchez, and F. J. Ferri, “Restricted
decontamination for the imbalanced training sample problem,” in: A.
Sanfeliu and J. Ruiz-Shulcloper, Eds., Progress in Pattern Recognition,
Speech and Image Analysis, LNCS vol. 2905, Berlin: Springer-Verlag,
2003, pp. 424–431.
[39] S. Garc´ıa, J. Cano, A. Fern´adez, and F. Herrera, “A proposal of
evolutionary prototype selection for class imbalance problems,” in:
E. Corchado, H. Yin, V. Botti, and C. Fyfe, Eds., Intelligent Data Engineering
and Automated Learning, LNCS vol. 4224, Berlin: Springer-
Verlag, 2006, pp. 1415–1423.
[40] R. Barandela, J. K. Hern´andez, J. S. S´anchez, and F. J. Ferri, “Imbalanced
training set reduction and feature selection through genetic
optimization,” in Proceeding of the 2005 Conference on Artificial
Intelligence Research and Development, vol. 131, 2005, pp. 215–222.
[41] I. Tomek, “Two modifications of CNN,” IEEE Transactions on Systems,
Man and Cybernetics, vol. 6, no. 11, pp. 769–772, 1976.
[42] D. L. Wilson, “Asymptotic properties of nearest neighbor rules using
edited data,” IEEE Transactions on Systems, Man and Cybernetic, vol. 2,
no. 3, pp. 408–421, 1972.
[43] R. Barandela, J. S. S´anchez, V. Garc´ıa, and E. Rangel, “Strategies for
learning in class imbalance problems,” Pattern Recognition, vol. 36,
no. 3, pp. 849–851, 2003.
[44] J. Laurikkala, “Improving identification of difficult small classes by
balancing class distribution,” in Proceedings of the 8th Conference on
AI in Medicine in Europe: Artificial Intelligence Medicine (Cascais,
Portugal), July 1-4, 2001, pp. 63–66.
[45] P. Hart, “The condensed nearest neighbor rule (Corresp.),” IEEE Transactions
on Information Theory, vol. 14, no. 3, pp. 515–516, 1968.
[46] R. Barandela, R. M. Valdovinos, J. S. S´anchez, and F. J. Ferri, “The
imbalanced training sample problem: Under or over sampling?” in:
A. Fred, T. Caelli, R. P. W. Duin, A. Campilho, and D. d. Ridder,
Eds., Structural, Syntactic, and Statistical Pattern Recognition, LNCS
vol.3138, Berlin: Springer-Verlag, 2004, pp. 806–814.
[47] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer,
“SMOTE: Synthetic minority over-sampling technique,” Journal of
Artificial Intelligence Research, vol. 16, pp. 321–357, 2002.
[48] B. X. Wang and N. Japkowicz, “Imbalanced data set learning with
synthetic samples,” in Proceedings of IRIS Machine Learning Workshop
(Ottawa, Canada), June 9, 2004.
[49] N. V. Chawla, A. Lazarevic, L. O. Hall, and K. W. Bowyer, “SMOTEBoost:
Improving prediction of the minority class in boosting,” in
Proceedings of the 7th European Conference on Principles and Practice
of Knowledge Discovery in Databases (Cavtat-Dubrovnik, Croatia),
Sept. 22-26, 2003, pp. 107–119.
[50] H. Han, W. Y. Wang, and B. H. Mao, “Borderline-SMOTE: A new oversampling
method in imbalanced data sets learning,” in: D.-S. Huang, X.-
P. Zhang, and G.-B. Huang, Eds., Advances in Intelligent Computing,
LNCS vol. 3644, Berlin: Springer-Verlag, 2005, pp. 878–887.
[51] H. He, Y. Bai, E. A. Garcia, and S. Li, “ADASYN: Adaptive synthetic
sampling approach for imbalanced learning,” in Proceedings of the 2008
International Joint Conference on Neural Networks (Hong Kong, China),
June 1-8, 2008, pp. 1322–1328.
[52] B. W. Silverman, Density Estimation for Statistics and Data Analysis.
London: Chapman and Hall, 1986.
[53] R. O. Duda and P. E. Hart, Pattern Classification and Scene Analysis.
New York: John Wiley & Sons Inc., 1973.
[54] C. M. Bishop, Neural Networks for Pattern Recognition. New York:
Oxford University Press, 1995.
[55] E. Parzen, “On estimation of a probability density function and mode,”
The Annals of Mathematical Statistics, vol. 33, no. 3, pp. 1065–1076,
1962.
[56] M. Gao, X. Hong, S. Chen, and C. J. Harris, “On combination of
SMOTE and particle swarm optimization based radial basis function
classifier for imbalanced problems,” in Proceedings of the 2011 International
Joint Conference on Neural Networks (San Jose, USA), July
30 - Aug. 5, 2011, pp. 1146–1153.
[57] X. Hong, S. Chen, and C. J. Harris, “A forward-constrained regression
algorithm for sparse kernel density estimation,” IEEE Transactions on
Neural Networks, vol. 19, no. 1, pp. 193–198, 2008
[58] S. Chen, X. Hong, and C. J. Harris, “Sparse kernel density construction
using orthogonal forward regression with leave-one-out test score and
local regularization,” IEEE Transactions on Systems, Man, and Cybernetics,
Part B, vol. 34, no. 4, pp. 1708–1717, 2004.
[59] X. Hong, S. Chen, and C. J. Harris, “An orthogonal forward regression
technique for sparse kernel density estimation,” Neurocomputing,
vol. 71, no. 4-6, pp. 931–943, 2008.
[60] K. Fukunaga, Introduction to Statistical Pattern Recognition, 2nd Edition.
Academic Press, 1990.
[61] J. W. Tukey and P. A. Tukey, “Graphical display of data sets in 3 or
more dimensions,” in: V. Barnett, ed., Interpreting Multivariate Data.
Chichester, UK: Wiley and Sons, 1981, pp. 189–257.
[62] X. Hong, S. Chen, and C. J. Harris, “A fast linear-in-the-parameters
classifier construction algorithm using orthogonal forward selection to
minimize leave-one-out misclassification rate,” International Journal of
Systems Science, vol. 39, no. 2, pp. 119–25, 2008.
[63] R. H. Myers, Classical and Modern Regression with Applications, 2ne
Edition. Boston: PWS-KENT, 1990.
[64] K. K. Lee, C. J. Harris, S. R. Gunn, and P. A. S. Reed, “Classification
of imbalanced data with transparent kernel,” in Proceedings of the 2001
International Joint Conference on Neural Networks (Washington DC,
USA), July 15-19, 2001, pp. 2410–2415.
[65] C. L. Blake and C. J. Merz, “UCI repository of machine learning
databases,” Department of Computer Science, University of
California, Department of Computer Science, Irvine, CA, 1998.
http://archive.ics.uci.edu/ml/datasets.html
[66] A. P. Bradley, “The use of the area under the ROC curve in the evaluation
of machine learning algorithms,” Pattern Recognition, vol. 30, pp. 1145–
1159, 1997.
[67] C. van Rijsbergen, Information Retrieval. London: Butterworths, 1979. University Staff: Request a correction | Centaur Editors: Update this record |