Accessibility navigation


Back to basics: how measures of lexical diversity can help discriminate between CEFR levels

Treffers-Daller, J. ORCID: https://orcid.org/0000-0002-6575-6736, Parslow, P. and Williams, S. (2018) Back to basics: how measures of lexical diversity can help discriminate between CEFR levels. Applied Linguistics, 39 (3). pp. 302-327. ISSN 1477-450X

[img]
Preview
Text - Accepted Version
· Please see our End User Agreement before downloading.

424kB

It is advisable to refer to the publisher's version if you intend to cite from this work. See Guidance on citing.

To link to this item DOI: 10.1093/applin/amw009

Abstract/Summary

This study contributes to ongoing discussions on how measures of lexical diversity (LD) can help discriminate between essays from second language learners of English, whose work has been assessed as belonging to levels B1 to C2 of the Common European Framework of Reference (CEFR). The focus is in particular on how different operationalisations of what constitutes a “different word” (type) impact on the LD measures themselves and on their ability to discriminate between CEFR levels. The results show that basic measures of LD, such as the number of different words, the TTR (Templin 1957) and the Index of Guiraud (Guiraud 1954) explain more variance in the CEFR levels than sophisticated measures, such as D (Malvern et al. 2004), HD-D (McCarthy and Jarvis 2007) and MTLD (McCarthy 2005) provided text length is kept constant across texts. A simple count of different words (defined as lemma’s and not as word families) was the best predictor of CEFR levels and explained 22 percent of the variance in overall scores on the Pearson Test of English Academic in essays written by 176 test takers.

Item Type:Article
Refereed:Yes
Divisions:Interdisciplinary Research Centres (IDRCs) > Centre for Literacy and Multilingualism (CeLM)
Science > School of Mathematical, Physical and Computational Sciences > Department of Computer Science
Arts, Humanities and Social Science > Institute of Education > Language and Literacy in Education
ID Code:54410
Uncontrolled Keywords:lexical diversity, CEFR, lemmatization, language testing, derivational morphology
Publisher:Oxford University Press

Downloads

Downloads per month over past year

University Staff: Request a correction | Centaur Editors: Update this record

Page navigation