Automatic keyphrase extraction: a comparison of methods

Tools

Lists

Hussey, R., Williams, S. and Mitchell, R. (2012) Automatic keyphrase extraction: a comparison of methods. In: eKNOW, Proceedings of The Fourth International Conference on Information Process, and Knowledge Management , Valencia, Spain, pp. 18-23.

Full text not archived in this repository.

It is advisable to refer to the publisher's version if you intend to cite from this work. See Guidance on citing.

Official URL: http://www.thinkmind.org/index.php?view=article&ar...

Abstract/Summary

There are many published methods available for creating keyphrases for documents. Previous work in the field has shown that in a significant proportion of cases author selected keyphrases are not appropriate for the document they accompany. This requires the use of such automated methods to improve the use of keyphrases. Often the keyphrases are not updated when the focus of a paper changes or include keyphrases that are more classificatory than explanatory. The published methods are all evaluated using different corpora, typically one relevant to their field of study. This not only makes it difficult to incorporate the useful elements of algorithms in future work but also makes comparing the results of each method inefficient and ineffective. This paper describes the work undertaken to compare five methods across a common baseline of six corpora. The methods chosen were term frequency, inverse document frequency, the C-Value, the NC-Value, and a synonym based approach. These methods were compared to evaluate performance and quality of results, and to provide a future benchmark. It is shown that, with the comparison metric used for this study Term Frequency and Inverse Document Frequency were the best algorithms, with the synonym based approach following them. Further work in the area is required to determine an appropriate (or more appropriate) comparison metric.

Item Type:	Conference or Workshop Item (Paper)
Refereed:	Yes
Divisions:	Science > School of Mathematical, Physical and Computational Sciences > Department of Computer Science
ID Code:	27770
Uncontrolled Keywords:	Term Frequency, Inverse Document Frequency, C-Value, NC-Value, Synonyms, Comparisons, Automated Keyphrase Extraction, Document Classification
Additional Information:	ISBN 9781612081816
Publisher Statement:	Publisher makes available at http://www.thinkmind.org/index.php?view=article&articleid=eknow_2012_1_40_60072 as stated at http://www.iaria.org/conferences2012/eKNOW12.html

Deposit Details

References

[1] R. Hussey, S. Williams, and R. Mitchell. 2011. “A Comparison of Methods for Automatic Document Classification”, Proceedings of BAAL, The Forty Fourth Annual Meeting of the British Association for Applied Linguistics. Bristol, United Kingdom. [2] K. Frantziy, S. Ananiadou, and H. Mimaz. 2000. “Automatic Recognition of Multi-Word Terms: the C-value/NC-value Method”, International Journal on Digital Libraries , 3 (2), pp. 117-132. [3] R. Hussey, S. Williams, and R. Mitchell. 2011. “Keyphrase Extraction by Synonym Analysis of n-grams for E-Journal Classification”, Proceedings of eKNOW , The Third International Conference on Information, Process, and Knowledge Management, pp. 83-86. Gosier, Guadeloupe/France. http://www.thinkmind.org/index.php?view=article&articleid= eknow_2011_4_30_60053 [Last access: 5 September 2011] [4] S.C. Sood, S.H. Owsley, K.J. Hammond, and L. Birnbaum. 2007. “TagAssist: Automatic Tag Suggestion for Blog Posts”. Northwestern University. Evanston, IL, USA. http://www.icwsm.org/papers/2--Sood-Owsley-Hammond- Birnbaum.pdf [Last accessed: 13 December 2010] [5] Technorati. 2006. “Technorati”. http://www.technorati.com [Last accessed: 13 December 2010] [6] E. Frank, G.W. Paynter, I.H. Witten, C. Gutwin, and C.G. Nevill-Manning. 1999. “Domain-Specific Keyphrase Extraction”, Proceedings 16th International Joint Conference on Artificial Intelligence, pp. 668–673. San Francisco, CA Morgan Kaufmann Publishers. [7] P.M. Roget. 1911. “Roget’s Thesaurus of English Words and Phrases (Index)”. http://www.gutenberg.org/etext/10681 [Last accessed: 13 December 2010] [8] Academics Conferences International. 2009. “ACI E- Journals”. http://academic-conferences.org/ejournals.htm [Last accessed: 13 December 2010] [9] PubMed Central. 2011. “PubMed Central Open Access Subset”. http://www.ncbi.nlm.nih.gov/pmc/tools/openftlist/ [Last accessed: 14 September 2011] [10] M.F. Porter. 1980. “An algorithm for suffix stripping”, Program, 14(3) pp. 130–137.

University Staff: Request a correction | Centaur Editors: Update this record

University of Reading

CentAUR: Central Archive at the University of Reading

Accessibility navigation

Automatic keyphrase extraction: a comparison of methods

Abstract/Summary

Page navigation

See also

Footer navigation