Learning vocabulary through listening: the role of vocabulary knowledge and listening proficiency

This study explored the impact of preexisting vocabulary knowledge (PVK) and listening proﬁciency on the vocabulary learning through listening of 137 Chinese learners of English, when provided with three types of oral vocabulary explanations—second language (L2), codeswitching (CS), and contrastive focus-on-form (CFoF)—and when no explanations (NE) were provided (extending Zhang & Graham, 2019). Listening proﬁciency was a more important factor inﬂuencing vocabulary learning through aural input than PVK was, with most notable gains for learners with high listening proﬁciency and low PVK. The CFoF approach was the most helpful for learners regardless of their PVK and listening proﬁciency, whereas the NE approach was the least helpful. More-over, comparing just the CS and L2 groups, the CS approach was more helpful for lower PVK learners and for more proﬁcient listeners than the L2 approach was. Higher PVK learners and less proﬁcient listeners, however, beneﬁted more from the L2 approach than from the CS approach. The study highlights the complex interplay of vocabulary


Introduction
Helping learners to gain a wide range of vocabulary knowledge is a fundamental issue for improving their general language proficiency globally, and no less so in classes teaching English as a foreign language (EFL) in China (Silver, Hu, & Iino, 2002).Discussion often focuses on the comparative merits of intentional learning and of incidental learning, where learners unintentionally "pick up" vocabulary knowledge when they are focused on understanding the meaning of the language input (Hulstijn, 2001).Whereas vocabulary gains tend to be smaller for the latter (Laufer & Girsai, 2008), perhaps because learners' attention is on global meaning rather than on individual vocabulary items, pedagogical activities can be used alongside a focus on meaning to enhance the salience of items (Sharwood Smith, 1991) and hence noticing (Schmidt, 1990) and thence vocabulary learning (Laufer, 2005).
The extent to which learners of different proficiency levels in a second language (L2) benefit from different types of attention-enhancing pedagogical activities is, however, underexplored, particularly in relation to aural input and the role of the first language (L1).A consideration of those issues is important, not only from a pedagogical perspective, but also within models of L2 vocabulary acquisition in which the L1 functions differently at different levels of proficiency (Jiang, 2002(Jiang, , 2004;;Kroll & Stewart, 1994).This study therefore investigated the impact of L2 learners' preexisting vocabulary knowledge (PVK) and listening proficiency on their vocabulary learning from attention-enhancing activities in the form of different types of vocabulary explanations after listening: no explanations (NE), explanations in the L2, codeswitched explanations, and those that give contrastive focus-on-form (CFoF) information.

Background Literature
The extent to which vocabulary knowledge can be acquired through listening has received less research attention than is the case for reading.A common thread running through the literature, however, is, first, that levels of vocabulary learning from listening are typically much lower than for reading (Brown, Waring, & Donkaewbua, 2008;Vidal, 2011); and, second, that listening may develop what van Zeeland and Schmitt (2013a) call the earlier-acquired aspects of vocabulary knowledge, such as form recognition, rather than "high levels of knowledge" (p.609), such as meaning recognition and recall.Amounts of learning through listening may also depend on learners' level of proficiency, in the form of either general proficiency, prior vocabulary knowledge, or specifically listening proficiency.For example, Vidal (2011), in a study comparing incidental vocabulary learning by university-level L2 learners through both reading and listening, found that for pretest to posttest gains, the higher learners' general proficiency level was, the smaller was the difference between gains made from reading and gains made from listening.Lower proficiency learners, especially at the very lowest levels of proficiency, retained very few words from the spoken input (fewer than from reading), possibly, according to Vidal, because they were hampered by difficulties in segmenting words from the speech stream, rendering attention to vocabulary problematic.
Existing vocabulary knowledge specifically might also be expected to influence how much vocabulary is gained through spoken input, given that vocabulary breadth is positively correlated with listening proficiency (Staehr, 2008); learners with larger vocabulary sizes might go on to acquire more words from aural input because they comprehend more of the input in the first place.This has been found to be the case for studies of incidental learning through listening as part of video viewing (Peters & Webb, 2018).By contrast, Rodgers (2013) found no effect of vocabulary knowledge on learning gains from television viewing among intermediate university learners of English.
These somewhat contradictory findings may result from the more complex relationship between vocabulary knowledge and listening comprehension compared with vocabulary knowledge and reading, with implications for how important vocabulary knowledge is for vocabulary learning through listening.First, a wide range of correlations between vocabulary knowledge and listening has been found in previous research, varying between r = 0.209 to r = 0.941 in an unpublished meta-analysis of 26 studies between 2000 and 2018 (Smith, 2019).This range may reflect differences across studies in how vocabulary knowledge has been measured.Commentators such as Staehr (2009) argue that aural vocabulary tests should be used rather than written tests in explorations of the relationship, a view supported empirically by Cheng and Matthews (2018).
Second, even though some strong correlations have been reported between vocabulary size and listening comprehension, generally speaking they tend to be lower than those reported for reading (Staehr, 2008).Vocabulary knowledge explains less variance in listening compared with other skills (Miralpeix & Muñoz, 2018), and lower levels of vocabulary knowledge seem to be needed for aural comprehension than for reading comprehension (van Zeeland & Schmitt, 2013b).L2 listeners can, and may indeed have to, draw on factors other than vocabulary knowledge when interpreting spoken input, for example, contextual information from gesture, tone of voice, and facial expression.Thus higher vocabulary knowledge does not always equate to better listening; for example, van Zeeland and Schmitt (2013b), in a study of university-level learners of English, found that higher levels of vocabulary knowledge generally led to higher levels of comprehension, but that there were also learners with lower levels of lexical knowledge but "adequate" comprehension.Similar findings are reported by Bonk (2000) for Japanese university learners of English, some of whom with lower levels of lexical knowledge achieved good comprehension, whereas others with higher levels had quite poor comprehension.These studies suggest that it might be useful to consider the effect of listening proficiency as well as vocabulary knowledge on how much vocabulary is acquired through spoken input.
The extent to which vocabulary knowledge is developed through listening may also depend on the type of listening activity engaged in and the types of additional support offered.Vocabulary learning during listening for meaning can be enhanced through a lexical focus-on-form approach (Laufer, 2005;Laufer & Girsai, 2008), in which the salience of lexical items is heightened so that the learners' attention is drawn in some way to phonological, orthographic, and semantic information about vocabulary items, thus increasing noticing (Schmidt, 1990).Indeed, the fleeting nature of spoken input means that lexical focus-on-form through some kind of input enhancement may be especially important for vocabulary learning through listening, where "focal" attention (Ellis, 1999, p. 35) is typically on broad understanding and only "peripheral" attention is given to individual linguistic items (Vidal, 2011).Enhancement of spoken input may include among other things the provision of video captions or annotations (Montero Perez & Desmet, 2012), or explanations or elaborations of the target items (Hennebry, Rogers, Macaro, & Murphy, 2013;Tian & Macaro, 2012;Vidal, 2011).
Different forms of input enhancement are likely to vary in the degrees of "involvement load" (Hulstijn & Laufer, 2001, p. 539) they prompt, however, and hence potentially lead to different amounts of learning.Involvement load is influenced by "need," "search," and "evaluation" (p.539): that is, to what extent learners have to understand a given item for task completion (need), seek out the item's meaning (search), or consider whether a given meaning or use for a word is the most appropriate one for a certain context (evaluation).The level of evaluation is considered to be especially important (Hulstijn & Laufer, 2001), as more evaluation usually brings with it deeper processing and therefore is more likely to lead to better learning outcomes.Laufer and Girsai (2008) draw on such a framework to explain greater gains made by learners experiencing a lexical focus-on-form approach in the context of reading.This approach, which they termed "contrastive form-focused instruction," provided input on "the similarities and differences between [the] L1 and L2 in terms of individual words and the overall lexical system" (p.696).Further evidence of the potential value of contrastive form-focused instruction as a type of vocabulary enhancement emerged from a study by Zhang and Graham (2019) in the context of listening, an investigation whose data we returned to for the present study.School-aged learners in China listened to a series of passages in English and then received one of four types of instruction: L2 explanations of vocabulary items in the passages; codeswitched (L1) explanations; explanations providing additional crosslinguistic information (CFoF); or NE but rather cultural information related to the passages.For short-and long-term vocabulary learning, the three treatment groups significantly outperformed the NE group.Whereas no statistically significant differences were found between the L2 and codeswitching (CS) groups for short-term and long-term improvement, gains for the CFoF group were significantly greater than for the L2 and CS groups (all as reported in Zhang & Graham, 2019).As the focus of the study was on exploring whether different types of input enhancement had differing impacts on vocabulary learning through listening, learners' PVK and listening proficiency levels were not of primary interest in that study, but rather were treated as covariates (i.e., two continuous control variables) in the data analysis to control any baseline differences between the intact groups included in the quasi-experimental design.
The study by Zhang and Graham (2019) is one of a growing number exploring vocabulary enhancements in the form of explanations by the teacher, either before or after listening.Such explanations are likely to be especially important for lower proficiency, beginning language learners in a classroom setting, where listening activities are often carried out under the guidance of the teacher rather than independently.For example, Pujadas and Muñoz (2019) conducted a 1-year intervention with 74 secondary school learners of English in Grade 8 (aged 13-14 years) in Spain.Learners watched 24 episodes of a TV series in one of four conditions: (a) L2 captions (written on the screen) and vocabulary preteaching, (b) L2 captions and no preteaching, (c) L1 subtitles and preteaching, and (d) L1 subtitles and no preteaching.All groups made significant vocabulary gains, but preteaching of items, with either L2 captions or L1 subtitles, led to the greatest gains.Higher general proficiency (measured by Oxford Placement Test scores) was related to higher vocabulary learning gains, with learners at the A2/B1 level showing significantly greater gains than those at A1 or Pre-A1.
A slightly larger group of studies has considered the impact of post-listening explanations on vocabulary learning, also with a focus on the interaction between learner proficiency level and types of explanations.Working with university EFL learners within a lexical focus-on-form approach, Tian and Macaro (2012) explored the impact of post-listening explanations that used either the L1 (CS group) or the L2-only.For analysis, learners were allocated to four proficiency levels according to their scores on a listening test and a vocabulary pretest.The authors hypothesized that lower proficiency learners would benefit more from L1 than from L2 explanations.This would be supported by models of vocabulary acquisition such as those of Jiang (2002Jiang ( , 2004) ) and Kroll and Stewart (1994), in which the L1 is the dominant language at the initial stage of L2 learning, and L2 words are lexically mediated by the L1.In Tian and Macaro's study, the CS group outperformed the L2-only group on short-term vocabulary learning, but the interaction between group and proficiency levels was not statistically significant.That is, learners in both treatment groups made significant pre-post vocabulary gains regardless of their proficiency level.Tian and Macaro (2012) suggested that the absence of a proficiency effect may be attributable to the low frequency level of the target items in their study, for which the higher proficiency learners as well as the lower proficiency learners needed support from the L1.An absence of proficiency effect was also reported in a subsequent study with a similar design but with younger, high school learners of French (Hennebry et al., 2013).Different findings emerged, however, from a study closely modeled on that of Tian and Macaro among adult EFL learners, where Lee and Levine (2020) investigated whether learners' proficiency level (intermediate vs. advanced) interacted with the two types of vocabulary instruction provided for listening input (L2 English-only vs. codeswitched L1 vocabulary explanations).Results indicated that whereas advanced learners learned and retained similar amounts of vocabulary knowledge across the two teaching conditions, the intermediate-level learners who received codeswitched L1 explanations significantly outperformed those who were given L2 English-only explanations in terms of long-term vocabulary retention.Additionally, they gained as much vocabulary knowledge as the advanced learners within the codeswitched L1 group.
Important though these studies are for their insights into vocabulary learning through listening, it should be noted that all used proficiency level as a categorical rather than an interval variable, making the analyses less sensitive than they might have been.This is because categorizing continuous predictors into groups (by using, for example, a median split) may suffer from some limitations (Aiken & West, 1991).First, the scale of the values within each category is highly likely to be skewed.Second, two very close values can be arbitrarily allocated into the lower band of a "high" category and the higher band of a "low" category.Moreover, by dividing a sample in this way, all values falling into one category are considered to be equivalent, when in fact they are not.In sum, the variance offered by a continuous variable is lost when it is converted into a categorical data, and it is such variance in proficiency that may reveal its potentially subtle effects.
Researchers thus have relatively few clear insights into the role of proficiency level in vocabulary learning through spoken input, and in particular in relation to the interaction between proficiency and vocabulary enhancement type, with contradictory findings within the small number of studies that have been undertaken.Additionally, relatively little attention has been paid to these issues in school settings, where questions of proficiency and how it interacts with instructional approaches are especially important.The current study therefore addressed the need for further research of this kind.

The Current Study
In the current study we returned to our previous study of high school vocabulary learning through listening (Zhang & Graham, 2019).We used data collected for that study but adopted a different perspective on the learning differences between the CS, L2, CFoF, and no explanation (NE) groups by going one step further: We included learners' PVK and listening proficiency as two predictors (of primary interest within the study design) and, more importantly, explored the interaction between these two predictors and other predictors, namely, (a) the time points at which the tests for the target vocabulary items were administered and (b) the four types of vocabulary explanation.
We posed two research questions in relation to post-listening vocabulary explanations (for L2, CS, CFoF, and NE conditions): 1. To what extent does vocabulary learning through listening, taking all conditions together, vary according to learners' PVK and listening proficiency?2. To what extent does the impact of the different types of post-listening vocabulary explanation vary according to learners' PVK or listening proficiency?

Sampling and Procedures
Following Tian and Macaro (2012), the study employed a quasi-experimental design, involving 137 first-year senior-secondary school EFL learners from four intact classes in China from one school (aged 15-16, with 7 years' English learning experience).Learners were preparing for the Gaokao, China's national university entrance exam, and hence had a proficiency level of around A2 to B1 on the CEFR, the Common European Framework of Reference for Languages (Council of Europe, 2001), or around levels 3-4 on IELTS, the International English Language Testing System (https://takeielts.britishcouncil.org/teachielts/test-information/scores-explained).Classes were randomly assigned to three treatment groups-a second-language (L2) group (n = 35), a teacher-CS group (n = 36), and a CFoF group (n = 33)-and one no-explanation (NE) group (n = 33).All groups were instructed by the first author (whose L1 is Chinese).In our choice of group names, we followed Tian and Macaro (2012) in the use of "codeswitching" to signal "principled rather than ad hoc L1 use" by the teacher (p.69), although we acknowledge, as they did, that the term has a somewhat different meaning outside of the classroom.The essence of the CFoF approach was providing crosslinguistic information about items' use rather than simply giving their meaning in the L1 (Laufer & Girsai, 2008).
The data collection procedure began with a general vocabulary knowledge test (GEVT), a vocabulary pretest, and a listening comprehension test (week 1).Six teaching sessions took place between weeks 4 and 9.All groups completed a vocabulary posttest at the end of each session.From the third teaching session inclusive (week 6), an additional vocabulary delayed posttest was administered at the same time as the vocabulary posttest.There were six delayed posttests in total, with the final two administered in weeks 10 and 11 after the completion of the teaching sessions.Each delayed posttest assessed long-term vocabulary retention for target items from the session delivered two weeks previously.We ensured that all target items received an equal delay of two weeks between the posttest and the delayed posttest for those items.Timings for all aspects of individual lessons (including test administration) were tightly controlled as part of detailed lesson plans strictly followed by the teacher, not only to ensure uniformity across the groups but also to adhere to the 45-minute lesson time required by the school.The study design is outlined in Figure 1.

Intervention Procedures and Materials
The intervention was implemented over six sessions for all groups.In each session, learners heard a prerecorded listening passage (once) and then answered three written comprehension questions, one eliciting understanding of the global meaning and two eliciting their understanding of specific and important details, so that learners' focus was on comprehending what the passages were about.The nature of the intervention then differed between the NE group and the three treatment groups (L2, CS, and CFoF).For the latter groups, the listening passage was played once more, sentence by sentence.More specific comprehension questions were asked orally at this stage in order to focus learners' attention on listening comprehension and to initiate active classroom participation.Subsequently, the teacher gave explanations of the target vocabulary items, again geared toward meaning comprehension of the passage, but each of the three treatment groups received a different form of vocabulary explanation: L2 only, codeswitched, or CFoF.
Steps were taken to ensure that each treatment group received the same amount of vocabulary explanation for a specific vocabulary item (see Appendix S1 in the Supporting Information online for examples of explanations).Thus, the L2 group learners first received a short sentence in English (the L2) explaining the meaning of the target vocabulary item.Then, they were given an additional L2 sentence including the target item and were required to use the L2 explanation of the target item that they had been given in order to paraphrase in the L2 the meaning of the additional L2 sentence.In the CS group, the meaning of the target lexical item was given by the teacher in Chinese.Learners were also given an additional L2 sentence including the target item and were asked to show understanding of the meaning of the sentence by translating the sentence into Chinese.In the CFoF group, learners were initially given the L1 meaning of the target vocabulary item, and then an additional explanation was provided in the L2, focusing on comparing and contrasting the L2 word and its L1 translation, drawing attention to any mismatch between the two.All groups also saw the written forms of the target items, presented through PowerPoint, but they were not allowed to write them down.
Finally, the instructor read out the whole listening passage once more and repeated L2, CS, and CFoF explanations for the target items, stopping after each target item while reading out the text.All groups heard the passage three times in total.Learners would already have been familiar to a certain extent with all three approaches to vocabulary explanation, because these were normally adopted by their class teachers when explaining new vocabulary.
Learners in the NE group first heard the same listening passage once and completed the same written listening comprehension questions as learners in the three treatment groups.The listening passage was then replayed sentence by sentence twice for them.In addition, the teacher gave them culture and background information in the L2 relating to the listening passage but unrelated to the target vocabulary items.Therefore, like learners in the three treatment groups, learners in the NE group also heard the listening passage three times in total.

Baseline Vocabulary Knowledge and Listening Comprehension Tests
Before the intervention, learners' PVK was assessed through an aural 160item general vocabulary knowledge test (GEVT).This was based on the aural Language Learning 00:0, xxxx 2020, pp.1-37
(time) D. (friends) Participants hear: Time, they have a lot of time.vocabulary levels test (McLean, Kramer, & Beglar, 2015) and designed to measure learners' existing vocabulary levels and also their existing knowledge of the target vocabulary items for the intervention.Piloting indicated that the first three most frequent bands of 1,000 words (1K, 2K, and 3K) and the academic word list were appropriate for the proficiency level of the participants, thus our GEVT drew on 100 items from those lists.An additional 60 target items, measuring learners' partial knowledge of the target lexical items to be presented in the intervention, were intermingled with the 100 items, so that the participants would not know which items were the focus of the study.This made 160 items in total.See https://www.iris-database.org/iris/app/home/detail?id=york:937834 for the test.The test format, meaning recognition with multiple choice, allowed for the assessment of partial knowledge of a large number of items.The researcher read out the target lexical item, first on its own, and then in a sentence that gave no clue to its meaning.Participants then had to select the correct Chinese translation for the English word from four options (one correct Chinese translation and three distractors).An example item for this test is given in Figure 2. The English translations in parentheses are given for clarification and did not appear in the test paper.Cronbach's alpha for the test was .76.Learners' listening proficiency was assessed at baseline through the first two sections of a standard IELTS listening test (https://takeielts.britishcouncil.org/take-ielts/prepare/free-ielts-practice-tests/listening.These tests cannot be made openly available as they are proprietary).These two sections were deemed appropriate for the level of the participants, because they were set in an everyday social context and hence assessed listening at around IELTS levels 3-4 (British Council, n.d.).Cronbach's alpha for this listening test was .62,comparable with what has been reported for subsections of the IELTS test elsewhere (Breeze & Miller, 2012).

Listening Passages
Materials for the intervention were chosen from an English textbook, New Senior English for China (Liu et al., 2007), in order to maximize the ecological validity of the materials used.Although aimed at learners of the same proficiency level as participants, the textbook was not used in the province where the school was located.Therefore, there was very little possibility of the participants having access to it before the intervention or outside of class.Six passages were identified and altered to ensure that they were on topics relevant to learners and were all around 250 words long.They were then turned into audio recordings by L1 English speakers, ensuring a speech rate of approximately 150-190 words per minute.That is, they were at the lower end of average speech rates for radio monologues and conversations in British English, according to Tauroza and Allison (1990).
The textbook had already highlighted the words and collocations that, based on the senior high school English curriculum (Ministry of Education, 2003), should be new and therefore taught to EFL students at the proficiency level of the participants in the study.The six listening passages were examined carefully, making sure that no more than 5% of the words in the passage were highlighted as new to learners, therefore reaching the threshold of 95% lexical coverage for listening comprehension (van Zeeland & Schmitt, 2013b).Two online vocabulary profilers (VP-Classic and VP-Compleat; Cobb, n.d.) showed that the items to be taught to learners consisted of (a) 43 single words, mainly from the 1K, 2K, and 3K most frequent bands and from the academic word list (18 nouns, 13 verbs, and 12 adjectives); and (b) 17 collocations, which we define as groups of words "that belong together, either because they commonly occur together . . .or because the meaning of the group is not obvious from the meaning of the parts" (Nation, 2001, p. 317).We therefore had 60 target items in total, 10 in each listening passage.All listening passages, their vocabulary profiles, and their target lexical items are provided in the Supporting Information online (in Appendices S2, S3, and S4, respectively).Learners in all four groups listened to all six passages.
As part of the teaching procedures, learners handed in their responses to the three initial, written comprehension questions, from each session, administered after the first hearing of each passage.Although they did not form part of the data used to judge the effectiveness of the interventions, the responses were reviewed to check that learners in each group had good comprehension of the six listening passages.On average, learners gave correct answers to more than two thirds of the 18 questions, and this was true across all four groups (L2: M = 12.71, SD = 1.87;CS: M = 12.27, Language Learning 00:0, xxxx 2020, pp.1-37 0 _________________ 1 2 3 4 5 Participants hear: Overcome, we need to overcome this.SD = 1.62;CFoF: M = 12.37, SD = 2.06; NE: M = 12.30, SD = 2.00).In addition, a one-way ANOVA test indicated that the four groups were not statistically different in their comprehension of the passages, F(2, 148) = 0.40, p = .75,η 2 = .01.

Vocabulary Posttests and Delayed Posttests
The impact of the listening sessions on learners' vocabulary knowledge was assessed through a vocabulary posttest and delayed posttest based on the test used by Tian (2011) but modified so that it took an aural form in which the teacher read out one target vocabulary item plus an additional sentence including the item.See https://www.iris-database.org/iris/app/home/detail?id=york:937834 for the tests.Learners then had 10 seconds to give their response.If they did not know the meaning of the item, they were asked to circle 0 on the answer sheet.If they knew the meaning of the item, however, they were required to write it down either in Chinese or the very same word in English and to circle a number from 1 to 5 indicating the degree to which they felt confident about the meaning they provided.Figure 3 shows an example item.Hence, although meaning recognition was assessed at pretest, meaning recall (without multiple choice responses) was assessed at posttest and delayed posttest, to provide a more stringent measure of vocabulary growth (see the sections below, Data Analysis and Limitations and Future Direction).also loaded onto one factor, explaining 57.56% of the variance.As these factor loadings were relatively high (.58 to .94), it was considered justifiable to aggregate scores for all six posttests and, separately, all six delayed posttests, giving one total score for each.The reliability for the aggregated vocabulary posttest was .94,and .92for the delayed posttest (Cronbach's alpha).The GEVT was used to give an indication of participants' relative vocabulary level through the number of correct items out of a total possible score of 100 (i.e., excluding the items used in the intervention), rather than an estimate of their vocabulary size.
The quantitative data were analyzed both by participant (137 participants) and by item (60 vocabulary items, coded 1 if correct and 0 if wrong).Therefore, the outcome variable in our analyses was at a binary level.Binary logistic regression tests are normally recommended for analyzing data with a single binary outcome variable and one or more continuous or categorical predictor variables.In order to control both by-participant and by-item random effects, however, we decided to adopt generalized linear mixed effects models, which allowed us to run binary logistic regression tests with a random effects structure.The models were run with the lmerTest package (Kuznetsova, Brockhoff, & Christensen, 2017), a package based on lme4, in R (version 3.5.0;R Development Core Team, 2018).Random effects were fitted using a maximal random effects structure (Barr, Levy, Scheepers, & Tily, 2013).In cases where a full random effects structure model did not converge, we first took out the interactions between random slopes, and then gradually removed random slopes that accounted for the least variance until a converged model was obtained.

Results
We first calculated descriptive statistics for the two baseline tests and the vocabulary pretests, posttests, and delayed posttests for each group, in order to explore (a) the extent to which vocabulary learning across all conditions together varied according to learners' PVK and listening proficiency, and (b) the extent to which the impact of different post-listening vocabulary explanations in the four groups varied according to learners' PVK or listening proficiency.Table 1 shows the results.
Four fixed effects were entered into our first model: Time (1. Pretest, 2. Posttest, 3. Delayed posttest); Group (CFoF, CS, L2, NE); GEVT (vocabulary test); Listening (listening comprehension test).Time 1 was set as the baseline for Time and CFoF was set as the baseline for Group.Three-way interactions involving the fixed effect of Time were also added to the fixed effects structure.The Time × GEVT × Listening interaction was examined to address the first research question, and the Time × Group × GEVT and Time × Group × Language Learning 00:0, xxxx 2020, pp.1-37 Listening interactions were examined to address the second research question.
Because interactions between categorical and continuous predictors were included in the model and the two continuous predictors, GEVT and Listening, were not on the same measurement scale as each other, we entered the standardized z scores of these two continuous predictors into the model.In this way, the two continuous predictors were both mean centered and were on the same measurement scale.
The random factors included Participants and Items.The random effects structure of the converged model included intercepts for Participants and Items and by-Item random slopes for Time and Group.This model represented a good fit to the data, R 2 marginal = .41(variance explained by the fixed effects), R 2 conditional = .77(variance explained by both the fixed effects and random effects), and there was no significant overdispersion or collinearity (all generalized variance inflation factors [GVIFs] < 3.5).Table 2 shows the results.There were significant three-way interactions for Time × Group × GEVT (lines 32 and 33 in the table), Time × Group × Listening (lines 27 and 30), and Time × GEVT × Listening (line 38).
We did not, however, go any further in interpreting the model results directly, as all contrasts made were in relation to the baseline level of the predictors.For example, the odds ratio in line 2 of Table 2 suggests that learners were 519.32 times more likely to correctly recall the meaning of the target vocabulary at Time 2, compared to Time 1 (the baseline level of Time), but this was only when they were from the CFoF group (the baseline level of Group) and when the two continuous predictors GEVT and Listening were set at baseline (i.e., equal to zero, hence centered at mean).Therefore, in order to give a clearer picture of the three-way interactions, we decided to break them down by running multiple pairwise comparisons using the emmeans package in R (Lenth, 2019).For each comparison, the p value was adjusted using Tukey to avoid Type I errors.
The first step was to interpret the Time × GEVT × Listening interactions, which addresses the first research question.We calculated multiple comparisons between the three time points while setting the GEVT scores constant at −2 (two standard deviations below the mean), 0 (averaged at the mean), and 2 (two standard deviations above the mean), and the Listening scores constant at −2, 0, and 2. The effect plot for these interactions is given in Figure 4. We decided to set both GEVT and Listening between −2 and 2 because this range would be large enough to cover most of the observations in our dataset (i.e., from learners with very low PVK/Listening to learners with very high PVK/Listening) and would provide a clear picture of how learners with different levels of GEVT or Listening progressed in their vocabulary learning.Table 3 shows the full results  for all pairwise comparisons, but our discussion will focus on the Time2-Time1 and Time3-Time1 contrasts, because Time 1 was the baseline level of Time (pretest), and these two contrasts are of primary interest in indicating short-term learning and long-term learning, respectively.
Interpreting the results, we start with the Time2-Time1 contrasts, which indicated short-term learning.First, learners at all levels of GEVT and listening made significant short-term gains in scores on the vocabulary test.The odds ratios for these contrasts showed that the largest short-term gains were made by learners with the highest listening proficiency but with the lowest level of GEVT.In addition, learners with both the lowest GEVT scores and the lowest listening proficiency made the smallest short-term gains.Regarding long-term retention (Time3-Time1 contrasts), results were somewhat similar to what we discovered for the Time2-Time1 contrasts.The greatest long-term gains were again observed for learners with the highest listening proficiency but with the lowest GEVT level.The smallest gains, which were not statistically significant, were made by learners with the highest GEVT scores and the highest Listening proficiency.
Further rather complex but consistent patterns across the four groups emerged for short-and long-term vocabulary learning in relation to GEVT and listening levels (both Time2-Time1 and Time3-Time1 contrasts).First, the lowest level listeners (z score at −2) benefited more as their GEVT levels increased.With average to high level listeners (z scores at 0 and 2), however, their vocabulary gains decreased with an increase in their GEVT scores.Second, both middle and lower level GEVT learners (z scores at 0 and −2) made more gains on the vocabulary tests as their listening proficiency increased.This was also true for the short-term learning of the higher Note.CI = confidence interval; LL = lower limit; UL = upper limit.
GEVT learners (z score at 2).The long-term learning of these learners, however, benefited more with every decrease of one SD in their listening proficiency.
Turning to our second research question, comparing the effect of PVK and listening proficiency respectively on vocabulary learning for each of the four groups, we first obtained and then plotted Time × Group × GEVT interactions (Figure 5).Then, while holding the Listening score constant at 0, we ran multiple comparisons between the three test time points by Group at three levels of GEVT: −2, 0, and 2. Table 4 shows the full results, but our discussion will again focus on the contrasts between Time2 and Time1 and between Time3 and Time1.
Results in Table 4 indicate that for short-term learning (Time2-Time1 contrasts) at all GEVT levels, all groups made significant gains, with the greatest gains observed in the CFoF group and the smallest gains in the NE group, as indicated by the odds ratios.Comparing odds ratios for the contrasts between different GEVT levels also showed that with every increase of one SD in learners' GEVT level, the gains on the vocabulary tests became smaller.This was the case for all groups.For example, CFoF group learners with lower GEVT scores (z score at −2) were 608.89 times more likely to successfully recall the meaning of the target vocabulary at Time2 compared to Time1.In contrast, their counterparts with average or higher GEVT levels (z score at 0 or 2, respectively) were only 519.32 and 442.93 times, respectively, more likely to do so at Time 2 compared to Time 1.In addition, although the CS approach showed a relatively bigger advantage over the L2 approach for lower level GEVT learners (odds ratio: 177.41 vs. 27.35),with every increase of one SD in learners' GEVT level, this advantage seemed to get smaller.Indeed, when learners' GEVT score reached 2, the odds ratio for the CS group contrast was only slightly higher than that for the L2 group (27.72 vs. 20.25).
Regarding differences in long-term vocabulary retention (Time3-Time1 contrasts) between groups, again the CFoF group made the largest gains at  Note.CI = confidence interval; LL = lower limit; UL = upper limit; CFoF = contrastive focus-on-form; CS = codeswitching; L2 = second language; NE = no explanations.
all levels of GEVT, with the largest odds ratios observed.Unlike what was found for short-term learning, the NE group at the lower GEVT level made no significant progress, with significant decreases in vocabulary scores at the average and higher GEVT levels.In addition, comparing odds ratios for the contrasts between different GEVT levels showed that the gains for learners in the CFoF group became larger with every increase of one SD in learners' GEVT scores (49.05 vs. 51.12 vs. 53.27).The gains for the CS and L2 groups, however, decreased with every increase of learners' GEVT levels.Finally, a further comparison of odds ratios between the CS and L2 groups indicated that when learners' GEVT was at the lower level (−2), they benefited more from the CS approach than the L2 approach.Learners who were at the average (0) and higher (2) GEVT levels were helped more by the L2 approach than by the CS approach.For listening, we followed a similar procedure to that for the Time × Group × GEVT interactions.We first plotted the Time × Group × Listening interaction (Figure 6) and then ran multiple comparisons between the three test time points by Group at three levels of Listening: −2, 0, 2. Table 5 presents the full results for all pairwise comparisons, but our discussion will again focus on contrasts between Time2 and Time1 and between Time3 and Time1.
The Time2-Time1 contrasts listed in Table 5 showed that learners across the three Listening levels from all four groups made significant short-term vocabulary gains.The odds ratios for these contrasts first indicated that, at all three Listening levels, the greatest vocabulary gains were made by the CFoF group, followed by the CS, L2, and finally the NE groups.Second, vocabulary gains across the four groups became larger with every increase of one SD in learners' Listening proficiency.Regarding long-term vocabulary retention (Time3-Time1 contrasts), it was only in the CFoF and L2 groups that learners at all three Listening proficiency levels made significant improvement.Whereas  Note.CI = confidence interval; LL = lower limit; UL = upper limit; CFoF = contrastive focus-on-form; CS = codeswitching; L2 = second language; NE = no explanations.learners at both average and higher Listening proficiency levels in the CS group made significant gains, lower level listeners did not.Additionally, higher level listeners from the NE group made no significant gains, and average and less proficient listeners from the NE group showed a significant decrease in vocabulary scores between Time 1 and Time 3. Furthermore, although the lower and average level listeners in the L2 group showed larger gains than their counterparts in the CS group, higher level listeners showed similar progress across both groups, with very similar odds ratios observed (8.80 vs. 8.50).

Discussion
We summarize our findings as follows.
For our first research question, examining the impact of learners' PVK and listening proficiency on vocabulary learning through listening for all conditions together, we found that learners' listening proficiency played a more important  role than PVK levels did, with the largest short-term and long-term vocabulary gains made by learners with the highest level listening proficiency but with the lowest level of PVK.In addition, whereas the lower level listeners made larger vocabulary gains with every increase of one SD in their PVK level, vocabulary learning gains for the average and higher level listeners decreased with every increase in their PVK level.
Our second research question asked whether the impact of post-listening vocabulary explanations in the L2, CS, CFoF, and NE groups varied according to learners' PVK or listening proficiency (summarized in Tables 6 and 7).
Turning first to PVK, our findings suggested that the CFoF approach was the most beneficial for short-and long-term vocabulary learning for learners across the PVK levels.Although lower PVK learners in the CFoF group made greater short-term gains than higher PVK learners did, the long-term learning gains from the CFoF group were greater for higher PVK learners.In other words, there were benefits for both PVK levels.These findings contrast with those for the NE group, which was the least beneficial teaching approach across all PVK levels for vocabulary learning. 2Furthermore, within the CS, L2, and NE groups, the higher the learners' PVK level was, the smaller the short-term vocabulary gains they made.The same was true for long-term learning within the CS and L2 groups; for the NE group, no significant gains were made by learners at any PVK level.Finally, differences emerged between the L2 and CS groups.Whereas higher PVK learners made greater gains in the L2 approach, lower level PVK learners benefited more from the CS approach.
Regarding whether the impact of post-listening vocabulary explanations in the L2, CS, CFoF, and NE groups varied according to learners' listening proficiency, findings were in some ways similar to what was found for PVK.First, CFoF again emerged as the most beneficial approach for both short-term and long-term learning for learners across the listening proficiency levels.Also similar to what was found for PVK, the NE teaching approach was the least beneficial across listening proficiency levels.
Second, and different from what was found for PVK, for short-term learning within the CS, L2, and NE groups, the higher the learners' listening level was, the larger the gains they made in vocabulary learning.The same was true for long-term learning within the CS and L2 groups, with no significant long-term learning gains for the NE group at any of the three listening proficiency levels.
Third, differences again emerged between the L2 and CS groups.Learners' listening proficiency seemed to be a stronger predictor for vocabulary learning in the CS group than in the L2 group.Learners at all listening proficiency levels benefited more from the CS approach than from the L2 approach for short-term learning, but the effect was most marked for the higher proficiency listeners.By contrast, for long-term learning, both the CS and L2 approaches led to similar gains for higher listening proficiency learners, whereas the L2 approach led to greater long-term vocabulary gains for lower and average level listeners than the CS.
We interpret these complex findings as follows.First, the more important role of listening proficiency compared with PVK (as evidenced by the Time × GEVT × Listening interaction) suggests that these two variables each play a different role in vocabulary learning through aural input and that the relationship between them is not necessarily a straightforward one (Smith, 2019).In other words, learners' vocabulary size may not be a wholly reliable indication of their ability to understand spoken input.Furthermore, even though we assessed PVK through an aural test, that test might not have fully captured learners' ability to recognize vocabulary in connected speech (van Zeeland, 2017).
Second, the fact that learners with the highest listening proficiency but the lowest PVK made the greatest overall vocabulary gains suggests that the ability to understand spoken input (as measured by our comprehension tests) despite lower PVK brings benefits for learning.Such learners resemble those in the studies by van Zeeland and Schmitt (2013b) and Bonk (2000), who achieved higher levels of listening comprehension than might have been expected from their vocabulary size.Commenting that "some L2 listeners seem to cope better with unknown vocabulary than others" (p.473), van Zeeland and Schmitt also acknowledge the possibility of those learners having "better metacognitive control of the comprehension process using effective combinations of different cognitive processes such as inferencing, elaborating, monitoring, evaluating, and predicting" (p.474).In turn, that approach, which seems to involve more active and strategic engagement with the listening input, may lead to greater vocabulary learning than was the case for those in our study who had higher levels of both PVK and listening proficiency.Learners with lower levels of listening proficiency, by contrast, made greater gains as their PVK level increased.That suggests that where listening proficiency was less developed, more PVK was needed to enable vocabulary learning from aural input.In such cases, PVK perhaps facilitated inferencing and the use of other strategies that helped learners to work out the meaning of any unknown words in the input, but these processes functioned less effectively when PVK was too low (see Graham, Santos, & Vanderplank, 2010, for similar arguments).At higher levels of listening proficiency, however, higher PVK may lead to smaller vocabulary gains, arguably because learners are having to work less hard to understand the input and therefore process it less deeply.
Looking at the impact of PVK in relation to each of the different types of vocabulary explanations, CFoF emerged as the most balanced approach, that is, with the greatest learning gains regardless of learners' PVK level.The crosslinguistic information provided about vocabulary items may have shifted the attentional direction (Ellis, 1999) of learners in the CFoF group away from the general meaning of the teacher's explanations to the target words themselves, made more salient and hence encouraging greater noticing (Schmidt, 1990).It may also have encouraged deeper processing and greater evaluation of the target items in so far as their uses in the L1 and L2 were compared and contrasted.The approach thus seemed to prompt greater involvement load as described by Hulstijn and Laufer (2001) and thence better and more durable learning.Lower PVK learners in the CFoF group, however, may have had more limited understanding of the crosslinguistic information provided, meaning that the gains made were short-term rather than long-term.
The benefits of the CFoF approach for higher PVK learners stands in contrast to how they fared in the L2, CS, and NE groups, where higher PVK levels were associated with smaller learning gains, perhaps because those conditions did not provide the additional linguistic information that seemed to be helpful for higher PVK learners in the CFoF condition.The CS condition, furthermore, was less helpful for higher PVK learners than the L2 approach was for long-term learning.Learners with a lower level of PVK, however, benefited more from the CS approach than from the L2 approach.This last finding is in line with Lee and Levine (2020), who found that intermediate learners in their study benefited significantly more from CS.Both findings would be supported by Kroll and Stewart's (1994) revised hierarchical model of bilingual memory.According to that model, although L2 words need to be attached to their L1 translation before access to the underlying mental concepts occurs (i.e., at lower proficiency levels), direct conceptual links between L2 words and conceptual representations can also be created when higher proficiency levels are reached.Learners with a larger vocabulary size in the L2 group (and also those in the CFoF group) may have gone beyond the early stage of language learning and been able to establish direct conceptual links between the target L2 words and their concepts, therefore having better understanding of the L2 explanations and the crosslinguistic explanations and hence experiencing significantly larger vocabulary gains compared with learners who had a smaller vocabulary size.
Turning to listening proficiency, the CFoF approach was again the most beneficial for vocabulary learning, regardless of proficiency level.The longterm learning benefits for the lower listening proficiency learners in the CFoF condition may have arisen because they gained from the crosslinguistic explanations the kind of information that they were less able to extract for themselves from the listening passage alone.That may have been particularly true for those learners who had lower listening proficiency but higher PVK; as noted above, higher PVK was linked with better long-term learning in the CFoF condition.The short-term learning gains for higher listening proficiency learners in the CFoF group may similarly relate back to the finding that learners with higher listening proficiency and lower PVK made the greatest gains overall, but that lower PVK was associated with short-term gains in the CFoF group.
In the other conditions, however, in contrast to what was found for PVK, the higher learners' listening proficiency was before the intervention, the greater the short-term vocabulary gains they made.For the CS group, where learners only had access to the teacher's L1 translation of vocabulary items, higher listening proficiency may have allowed them to supplement such basic information with information gained from the listening passage regarding the items' use.That may also have been the case for higher listening proficiency learners in the NE group, who had NE of vocabulary items at all.For the L2 group, higher listening proficiency probably not only facilitated greater understanding of the teacher's explanations, but also perhaps allowed them to extract information from the passage itself and the example sentence provided by the teacher that presented target items in context.
These findings therefore, although confirming the benefits of the CFoF approach for vocabulary learning reported in Zhang and Graham (2019), constitute an important extension to that study, by showing how forms of instruction for aural input interact with learners' PVK and listening proficiency levels.

Limitations and Future Directions
Our study was set in a high school in China, where it is not unusual for teachers to provide vocabulary explanations alongside meaning-focused comprehension activities, as also seems to be true in other contexts with learners of a similar age (Pujadas & Muñoz, 2019).We acknowledge, however, that our findings may not extend to contexts where listening to aural input is less closely directed by a teacher.Similarly, in our study each group only experienced one form of treatment and learners had to remain in intact classes, as is often the case in school-based investigations, thus not permitting random assignment to different treatment groups.This poses a limitation.The same is true of the use of a different test format at pretest compared with posttest and delayed posttest (see the section above Vocabulary Posttests and Delayed Posttests).We addressed these limitations, however, first by using generalized linear mixed effects models to analyze the data, which controlled random effects due to repeated testing and individual differences.Second, allowing learners' PVK and listening proficiency, assessed at baseline, to interact with the other two predictors (test time points and treatment conditions) in the analysis further addressed the possible limitations arising from the lack of random assignment of students to different treatment groups.Furthermore, our findings, using PVK and listening as separate, continuous variables indexing general language proficiency, provide a clearer picture of how both variables interact with vocabulary learning through listening than previous studies using categorical variables and combined, general proficiency measures (e.g., Hennebry et al., 2013;Lee & Levine, 2020;Tian & Macaro, 2012;Vidal, 2011).
We can therefore conclude with some confidence that a CFoF approach is the most helpful type of post-listening teacher explanation for learners regardless of their PVK and listening proficiency.Comparing just the CS and L2 groups, the CS approach was more helpful for lower PVK learners and more proficient listeners than the L2 approach was.Learners with a higher level of PVK and those with a lower level of listening proficiency, however, benefited more from the L2 approach than from the CS approach.Our potentially most interesting finding, however, relates to the importance of listening proficiency as a factor influencing vocabulary learning through aural input.It would be useful for future research to explore exactly how important it is.Future research might also consider whether the impact of listening proficiency, PVK, and teaching approach is similar for both collocations and single words.
In addition, the fact that vocabulary gains were greatest in the long-term for learners with higher listening proficiency and lower PVK is worthy of further exploration, to establish why and how such students are able to learn particularly well from aural input.These findings provide important evidence of the complex interplay of vocabulary knowledge, listening proficiency, and instructional conditions, with implications for theories of vocabulary learning.They underline the potential relevance of the involvement load hypothesis (Hulstijn & Laufer, 2001) for learning through aural input, an issue largely overlooked by previous research.That is, our findings point to the importance of conditions, such as explicitly contrasting the L2 forms with the equivalent L1 forms, that seem to facilitate deeper processing and hence better learning.For vocabulary learning during listening, this may be aural input and learning activities that require some degree of effortful or strategic engagement without being so far beyond learners' level of vocabulary knowledge and listening proficiency that such engagement is inhibited.

Conclusion
In conclusion, these findings, which are of relevance beyond the Chinese learners of English we studied, add to our understanding of vocabulary learning through listening, of the potential value of CFoF, and of the role of listening proficiency and prior vocabulary knowledge within such approaches.At a pedagogical level, learners' vocabulary size and listening proficiency should be taken into consideration when planning activities to enhance vocabulary learning through listening, with due attention given to the benefits that might be gained from an approach using CFoF.

Final revised version accepted 3 February 2020
Notes 1 The vocabulary posttests and delayed posttests were scored either right or wrong.
We acknowledge the fact that these tests could have been scored ordinally (e.g., by giving half marks), as suggested by a reviewer.However, scoring ordinally would have required us to use ordinal logistic regression.We already had a rather complex fixed effects structure with three-way interactions between predictor variables in the generalized linear mixed effects models; adding an additional level to the outcome variable would, we believe, have overcomplicated the statistical analyses.2 As commented by a reviewer, lower levels of vocabulary learning for the NE group are perhaps unsurprising; that this was true regardless of PVK is still noteworthy, however.

Open Research Badges
This article has earned an Open Materials badge for making publicly available the components of the research methods needed to reproduce the reported procedure.All materials that the authors have used and have the right to share are available at https://www.iris-database.org/.All proprietary materials have been precisely identified in the manuscript.

Figure 2
Figure 2Example item for GEVT (general vocabulary knowledge test).

Figure 3
Figure 3 Example item for the vocabulary posttest and delayed posttest.

Figure 4
Figure 4 Effect plot for Time × GEVT (General Vocabulary Knowledge Test).

Table 1
Descriptive statistics for the baseline tests and the vocabulary tests The maximum score for the GEVT and listening test is 100.00; that for the pretest, posttest, and delayed posttest is 60.00.GEVT

Table 2
Results for the final converged mixed effects model

Table 6
Variation in impact of different vocabulary explanations according to level of preexisting vocabulary knowledge (PVK) and listening proficiency, organized by proficiency level

Table 7
Variation in impact of different vocabulary explanations according to level of preexisting vocabulary knowledge (PVK) and listening proficiency, organized by vocabulary explanation group