Agarwal, S., Godbole, S., Punjani, D. and Roy, S. (2007) ‘How much noise is too much: a study in
automatic text classification’, Proc ICDM-07, the 7th IEEE International Conference on Data
Mining, pp.3–12.
Aggarwal, C.C. and Zhai, C. (2012) ‘A survey of text clustering algorithms’, Mining Text Data,
pp.163–222, doi 10.1007/978-1-4614-3223-4_6, Springer-Verlag.
Apte, C., Damerau, F. and Weiss, S.M. (1994) ‘Automated learning of decision rules for text
categorization’, Information Systems, Vol. 12, No. 3, pp.233–251.
Brodley, C.E. and Friedl, M.A. (1996) ‘Identifying and eliminating mislabelled training instances’,
Proceedings of AAAI-96, the 13th National Conference on Artificial Intelligence, pp.799–805.
Cantu-Paz, E., Newsam, S. and Kamath, C. (2004) ‘Feature selection in scientific applications’,
Proceedings of KDDD-04, the 10th ACM SIGKDD International Conference on Knowledge
Discovery and Data Mining, pp.788–793, New York, USA.
Cohen, A.M., Bhupatiraju, R.T. and Hersh, W.R. (2004) ‘Feature generation, feature selection,
classifiers, and conceptual drift for biomedical document triage’, Proceedings of TREC-04, the
13th Text Retrieval Conference.
Dave, R.N. (1991) ‘Characterization and detection of noise in clustering’, Pattern Recognition
Letters, Vol. 12, No. 11, pp.657–664.
Daza, L. and Acuna, E. (2007) ‘An algorithm for detecting noise on supervised classification’,
Proceedings of WCECS-07, the 1st World Conference on Engineering and Computer Science,
pp.701–706.
Ding, Y., Korotkiy, M., Omelayenko, B., Kartseva, V., Zykov, V., Klein, M., Schulten, E. and
Fensel, D. (2002) ‘Goldenbullet. Automated classification of product data in e-commerce’,
Proc. BIS-02, Poznan, pp.1–9.
Dumais, S.T., Platt, J., Heckerman, D. and Sahami, M. (1998) ‘Inductive learning algorithms and
representations for text categorization’, Proc. CIKM-98, pp.148–155.
Dunning, T. (1994) ‘Accurate methods for the statistics of surprise and coincidence’,
Computational Linguistics, Vol. 19, No. 1, pp.61–74.
Fensel, D., Ding, Y., Omelayenko, B., Schulten, E., Botquin, G., Brown, M. and Flett, A. (2001)
‘Product data integration in b2b e-commerce’, IEEE Intelligent Systems, Vol. 16, No. 4,
pp.54–59.
Forman, G. (2003) ‘An extensive empirical study of feature selection metrics for text
classification’, Journal of Machine Learning Research, March, Vol. 3, pp.1289–1305.
Gamberger, D., Lavrac, N. and Groselj, C. (1999) ‘Experiments with noise filtering in a medical
domain’, Proceedings of ICML-99, the 16th International Conference on Machine Learning,
pp.143–151.
GreenInsight (2012) [online] http://www.green-insight.com (accessed 8 August 2012).
Hepp, M., Leukel, J. and Schmitz, V. (2005) ‘A quantitative analysis of eCl@ss, UNSPSC, eOTD,
and RNTD content, coverage and maintenance’, Proc. ICEBE-05, pp.572–581.
Huang, S.H. (2003) ‘Dimensionality reduction in automatic knowledge acquisition: a simple greedy
search approach’, IEEE Transactions on Knowledge and Data Engineering, Vol. 15, No. 6,
pp.1364–1373.
Ittner, D.J., Lewis, D.D. and Ahn, D.D. (1995) ‘Text categorization of low quality images’, Proc.
SDAIR-95, pp.301–315.
Jirapech-Umpai, T. and Aitken, S. (2005) ‘Feature selection and classification for microarray data
analysis – evolutionary methods for identifying predictive genes’, BMC Bioinformatics,
Vol. 6, No. 148, 11p, DOIs: http://dx.doi.org/10.1186/1471-2105-6-148.
Joachims, T. (1997) ‘A probabilistic analysis of the Rocchio algorithm with TFIDF for text
categorization’, Proc. ICML-97, pp.143–151.
Joachims, T. (1998) ‘Text categorization with support vector machines: learning with many
relevant features’, Proc. ECML-98, pp.137–142.
Joachims, T. (2001) Learning to Classify Text Using Support Vector Machines, Kluwer Academic
Publishers, Norwell, MA.
Lewis, D.D. (1998) ‘Naïve (Bayes) at forty: the independence assumption in information retrieval’,
Proc. ECML-98, pp.4–15.
Li, S., Xia, R., Zong, C. and Huang, C. (2001) ‘A framework for feature selection methods for text
categorization’, Proceedings of ACL-09, the 47th Annual Meeting of the Association for
Computational Linguistics, pp.692–700.
Liu, H. and Yu, L. (2005) ‘Toward integrating feature selection algorithms for classification
and clustering’, IEEE Transactions on Knowledge and Data Engineering, Vol. 17, No. 4,
pp.491–502.
Mendonca, E.A., Cimino, J.J. and Johnson, S.B. (2001) ‘Using narrative reports to support a digital
library’, Journal of the American Medical Informatics Association, Vol. 8.
National Audit Office (2011) The Procurement of Consumables by NHS Hospital Trusts [online]
http://www.nao.org.uk/publications/1011/nhs_procurement.aspx (accessed 16 July 2012).
Quinlan, J.R. (1986) ‘Induction of decision trees’, Machine Learning, Vol. 1, No. 1, pp.81–106.
Quinlan, J.R. (1993) C4.5: Programs for Machine Learning, Morgan Kaufmann, San Francisco,
CA, USA.
Ramakrishnan, G., Chitrapura, K.P., Krishnapuram, R. and Bhattacharyya, P. (2005) ‘A model for
handling approximate, noisy or incomplete labeling in text classification’, Proceedings of
ICML-05, the 22nd International Conference on Machine Learning, pp.681–688.
Roberts, P.J. (2011) Automatic Product Classification, PhD thesis, University of Reading, UK.
Roberts, P.J., Howroyd, J., Mitchell, R.J. and Ruiz. V.F. (2010) ‘Identifying problematic classes in
text classification’, Proc. CIS2010, pp.136–141.
Roberts, P.J., Mitchell, R.J., Ruiz, V.F. and Bishop, J.M. (2012) ‘Classification in e-procurement’,
Proc CIS2012, Limerick, pp.1–6.
Soucy, P. and Mineau, G.W. (2005) ‘Beyond TFIDF weighting for text categorization in the vector
space model’, Proc of IJCAI-05, the 19th International Joint Conference on Artificial
Intelligence, pp.1130–1135.
SpendInsight (2012) [online] http://www.spendinsight.com (accessed 8 August 2012).
Verbaeten, S. and van Assche, A. (2003) ‘Ensemble methods for noise elimination in classification
problems’, in Windeatt, T. and Roli, F. (Eds.): Multiple Classifier Systems, Vol. 2709, Lecture
Notes in Computer Science, pp.317–325.
Wolin, B. (2002) ‘Automatic classification in product catalogs’, Proc. SIGIR-02, pp.351–352.
Yang, Y. (1994) ‘Expert network: effective and efficient learning from human decisions in text
categorization and retrieval’, Proc. SIGIR-94, pp.13–22.
Yang, Y. (1999) ‘An evaluation of statistical approaches to text categorization’, Information
Retrieval, Vol. 1, No. 1, pp.69–90.
Yang, Y. and Pedersen, J.O. (1997) ‘A comparative study on feature selection in text
categorization’, Proc ICML-97, the 14th International Conference on Machine Learning,
pp.412–420, Nashville, USA.
Zheng, Z., Wu, X. and Srihari, R. (2004) ‘Feature selection for text categorization on imbalanced
data’, ACM SIGKDD Explorations Newsletter, Vol. 6, No. 1, pp.80–89.