Classification in e-procurement

Roberts, Paul J; Mitchell, Richard; Ruiz, Virginie; Bishop, Mark

Download

Full text not archived in this repository.

Advice

Please see our End User Agreement.

It is advisable to refer to the publisher's version if you intend to cite from this work. See Guidance on citing.

Tools

Lists

Roberts, P. J., Mitchell, R., Ruiz, V. and Bishop, M. (2014) Classification in e-procurement. International Journal of Applied Pattern Recognition, 1 (3). pp. 298-314. ISSN 2049-8888 doi: 10.1504/IJAPR.2014.065770

Abstract/Summary

Three coupled knowledge transfer partnerships used pattern recognition techniques to produce an e-procurement system which, the National Audit Office reports, could save the National Health Service £500 m per annum. An extension to the system, GreenInsight, allows the environmental impact of procurements to be assessed and savings made. Both systems require suitable products to be discovered and equivalent products recognised, for which classification is a key component. This paper describes the innovative work done for product classification, feature selection and reducing the impact of mislabelled data.

Altmetric Badge

Dimensions Badge

Item Type	Article
URI	https://centaur.reading.ac.uk/id/eprint/39677
Identification Number/DOI	10.1504/IJAPR.2014.065770
Refereed	Yes
Divisions	Science > School of Mathematical, Physical and Computational Sciences > Department of Computer Science
Uncontrolled Keywords	classification; feature selection; noise reduction; e-procurement.
Publisher	Inderscience Publishers
Download/View statistics	View download statistics for this item

Deposit Details

References

Agarwal, S., Godbole, S., Punjani, D. and Roy, S. (2007) ‘How much noise is too much: a study in automatic text classification’, Proc ICDM-07, the 7th IEEE International Conference on Data Mining, pp.3–12. Aggarwal, C.C. and Zhai, C. (2012) ‘A survey of text clustering algorithms’, Mining Text Data, pp.163–222, doi 10.1007/978-1-4614-3223-4_6, Springer-Verlag. Apte, C., Damerau, F. and Weiss, S.M. (1994) ‘Automated learning of decision rules for text categorization’, Information Systems, Vol. 12, No. 3, pp.233–251. Brodley, C.E. and Friedl, M.A. (1996) ‘Identifying and eliminating mislabelled training instances’, Proceedings of AAAI-96, the 13th National Conference on Artificial Intelligence, pp.799–805. Cantu-Paz, E., Newsam, S. and Kamath, C. (2004) ‘Feature selection in scientific applications’, Proceedings of KDDD-04, the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp.788–793, New York, USA. Cohen, A.M., Bhupatiraju, R.T. and Hersh, W.R. (2004) ‘Feature generation, feature selection, classifiers, and conceptual drift for biomedical document triage’, Proceedings of TREC-04, the 13th Text Retrieval Conference. Dave, R.N. (1991) ‘Characterization and detection of noise in clustering’, Pattern Recognition Letters, Vol. 12, No. 11, pp.657–664. Daza, L. and Acuna, E. (2007) ‘An algorithm for detecting noise on supervised classification’, Proceedings of WCECS-07, the 1st World Conference on Engineering and Computer Science, pp.701–706. Ding, Y., Korotkiy, M., Omelayenko, B., Kartseva, V., Zykov, V., Klein, M., Schulten, E. and Fensel, D. (2002) ‘Goldenbullet. Automated classification of product data in e-commerce’, Proc. BIS-02, Poznan, pp.1–9. Dumais, S.T., Platt, J., Heckerman, D. and Sahami, M. (1998) ‘Inductive learning algorithms and representations for text categorization’, Proc. CIKM-98, pp.148–155. Dunning, T. (1994) ‘Accurate methods for the statistics of surprise and coincidence’, Computational Linguistics, Vol. 19, No. 1, pp.61–74. Fensel, D., Ding, Y., Omelayenko, B., Schulten, E., Botquin, G., Brown, M. and Flett, A. (2001) ‘Product data integration in b2b e-commerce’, IEEE Intelligent Systems, Vol. 16, No. 4, pp.54–59. Forman, G. (2003) ‘An extensive empirical study of feature selection metrics for text classification’, Journal of Machine Learning Research, March, Vol. 3, pp.1289–1305. Gamberger, D., Lavrac, N. and Groselj, C. (1999) ‘Experiments with noise filtering in a medical domain’, Proceedings of ICML-99, the 16th International Conference on Machine Learning, pp.143–151. GreenInsight (2012) [online] http://www.green-insight.com (accessed 8 August 2012). Hepp, M., Leukel, J. and Schmitz, V. (2005) ‘A quantitative analysis of eCl@ss, UNSPSC, eOTD, and RNTD content, coverage and maintenance’, Proc. ICEBE-05, pp.572–581. Huang, S.H. (2003) ‘Dimensionality reduction in automatic knowledge acquisition: a simple greedy search approach’, IEEE Transactions on Knowledge and Data Engineering, Vol. 15, No. 6, pp.1364–1373. Ittner, D.J., Lewis, D.D. and Ahn, D.D. (1995) ‘Text categorization of low quality images’, Proc. SDAIR-95, pp.301–315. Jirapech-Umpai, T. and Aitken, S. (2005) ‘Feature selection and classification for microarray data analysis – evolutionary methods for identifying predictive genes’, BMC Bioinformatics, Vol. 6, No. 148, 11p, DOIs: http://dx.doi.org/10.1186/1471-2105-6-148. Joachims, T. (1997) ‘A probabilistic analysis of the Rocchio algorithm with TFIDF for text categorization’, Proc. ICML-97, pp.143–151. Joachims, T. (1998) ‘Text categorization with support vector machines: learning with many relevant features’, Proc. ECML-98, pp.137–142. Joachims, T. (2001) Learning to Classify Text Using Support Vector Machines, Kluwer Academic Publishers, Norwell, MA. Lewis, D.D. (1998) ‘Naïve (Bayes) at forty: the independence assumption in information retrieval’, Proc. ECML-98, pp.4–15. Li, S., Xia, R., Zong, C. and Huang, C. (2001) ‘A framework for feature selection methods for text categorization’, Proceedings of ACL-09, the 47th Annual Meeting of the Association for Computational Linguistics, pp.692–700. Liu, H. and Yu, L. (2005) ‘Toward integrating feature selection algorithms for classification and clustering’, IEEE Transactions on Knowledge and Data Engineering, Vol. 17, No. 4, pp.491–502. Mendonca, E.A., Cimino, J.J. and Johnson, S.B. (2001) ‘Using narrative reports to support a digital library’, Journal of the American Medical Informatics Association, Vol. 8. National Audit Office (2011) The Procurement of Consumables by NHS Hospital Trusts [online] http://www.nao.org.uk/publications/1011/nhs_procurement.aspx (accessed 16 July 2012). Quinlan, J.R. (1986) ‘Induction of decision trees’, Machine Learning, Vol. 1, No. 1, pp.81–106. Quinlan, J.R. (1993) C4.5: Programs for Machine Learning, Morgan Kaufmann, San Francisco, CA, USA. Ramakrishnan, G., Chitrapura, K.P., Krishnapuram, R. and Bhattacharyya, P. (2005) ‘A model for handling approximate, noisy or incomplete labeling in text classification’, Proceedings of ICML-05, the 22nd International Conference on Machine Learning, pp.681–688. Roberts, P.J. (2011) Automatic Product Classification, PhD thesis, University of Reading, UK. Roberts, P.J., Howroyd, J., Mitchell, R.J. and Ruiz. V.F. (2010) ‘Identifying problematic classes in text classification’, Proc. CIS2010, pp.136–141. Roberts, P.J., Mitchell, R.J., Ruiz, V.F. and Bishop, J.M. (2012) ‘Classification in e-procurement’, Proc CIS2012, Limerick, pp.1–6. Soucy, P. and Mineau, G.W. (2005) ‘Beyond TFIDF weighting for text categorization in the vector space model’, Proc of IJCAI-05, the 19th International Joint Conference on Artificial Intelligence, pp.1130–1135. SpendInsight (2012) [online] http://www.spendinsight.com (accessed 8 August 2012). Verbaeten, S. and van Assche, A. (2003) ‘Ensemble methods for noise elimination in classification problems’, in Windeatt, T. and Roli, F. (Eds.): Multiple Classifier Systems, Vol. 2709, Lecture Notes in Computer Science, pp.317–325. Wolin, B. (2002) ‘Automatic classification in product catalogs’, Proc. SIGIR-02, pp.351–352. Yang, Y. (1994) ‘Expert network: effective and efficient learning from human decisions in text categorization and retrieval’, Proc. SIGIR-94, pp.13–22. Yang, Y. (1999) ‘An evaluation of statistical approaches to text categorization’, Information Retrieval, Vol. 1, No. 1, pp.69–90. Yang, Y. and Pedersen, J.O. (1997) ‘A comparative study on feature selection in text categorization’, Proc ICML-97, the 14th International Conference on Machine Learning, pp.412–420, Nashville, USA. Zheng, Z., Wu, X. and Srihari, R. (2004) ‘Feature selection for text categorization on imbalanced data’, ACM SIGKDD Explorations Newsletter, Vol. 6, No. 1, pp.80–89.

CORE (COnnecting REpositories)

University Staff: Request a correction | Centaur Editors: Update this record

Date Deposited:	20 Mar 2015 12:27	Date item deposited into CentAUR
Last Modified:	10 May 2026 01:57	Date item last modified