• Login
    View Item 
    •   Home
    • Theses and Dissertations
    • Theses and Dissertations
    • View Item
    •   Home
    • Theses and Dissertations
    • Theses and Dissertations
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Browse

    All of TUScholarShareCommunitiesDateAuthorsTitlesSubjectsGenresThis CollectionDateAuthorsTitlesSubjectsGenres

    My Account

    LoginRegister

    Help

    AboutPeoplePoliciesHelp for DepositorsData DepositFAQs

    Statistics

    Most Popular ItemsStatistics by CountryMost Popular Authors

    THE VOCABULARY OF EXTENSIVE READING: A CORPUS ANALYSIS OF GRADED READERS

    • CSV
    • RefMan
    • EndNote
    • BibTex
    • RefWorks
    Thumbnail
    Name:
    Kramer_temple_0225E_15442.pdf
    Size:
    15.81Mb
    Format:
    PDF
    Download
    Genre
    Thesis/Dissertation
    Date
    2023-08
    Author
    Kramer, Brandon cc
    Advisor
    Pinchbeck, Geoffrey G., 1967-
    Committee member
    Beglar, David
    Vitta, Joseph P.
    Nakata, Tatsuya
    Department
    Applied Linguistics
    Subject
    English as a second language
    Permanent link to this record
    http://hdl.handle.net/20.500.12613/8955
    
    Metadata
    Show full item record
    DOI
    http://dx.doi.org/10.34944/dspace/8919
    Abstract
    The importance of input on language learning cannot be overstated. One method of providing input to learners at a level that is appropriate for them is called extensive reading, in which learners read an abundance of texts. In practice, for learners of English as a second or foreign language, these texts are often books that have been written and classified into a particular difficulty level, called graded readers. Previous studies of the language in these texts have been limited in size and scope, often including books from a single publisher or series. However, if these books are meant to serve as the primary source of input for students in extensive reading programs, it is important to not only better understand the language in them, but to understand how the books within different series and made by different publishers compare with one another. Therefore, in this study I investigated the single- and multiword expressions present in graded readers for three purposes.First, I wished to better understand the difficulty of the texts by analyzing the vocabulary within them and learning how much vocabulary knowledge is required to reach 95% and 98% lexical coverage thresholds. Second, I wished to investigate the multi-word expressions (MWE) present in graded readers to better understand what MWEs students are exposed to when reading these books. Third, I investigated how the use of MWEs differs between graded readers at each level of text difficulty, as defined by reading levels defined by the Extensive Reading Foundation (ERF). In order to address these problems, I utilized a large corpus of 1,872 graded readers containing 16,448,662 tokens. Using this corpus, I calculated the coverage figures for all texts within each level to determine the vocabulary required to reach 95 and 98% levels of coverage. These coverage figures were calculated using two kinds of lists, frequency- and difficulty-based, each meant to represent learner word knowledge. The frequency-based lists were the New General Service List (New GSL; Brezina & Gablasova, 2015), another list by the same name, which I refer to as the NGSL (Browne, 2014), and Nation’s BNC/COCA list (2020) based on the British National Corpus (BNC) and the Corpus of Contemporary American English (COCA). The difficulty-based list was the Scale of English Word Knowledge–Japanese (SEWK-J), a word list designed to estimate vocabulary difficulty for Japanese learners of English (Mizumoto et al., 2021; Pinchbeck, 2019). The results of the single-word analyses showed that graded readers start to be become available at the minimum 95% threshold of known vocabulary of around the 1,700 rank in the lemma-based New GSL, the 1,250 rank for the flemma-based NGSL, and the first 1,000-word level for the level-6 word family-based BNC/COCA lists (based on the 25th percentiles for ERF level 1 using those lists). Studying beyond those ranks and levels should give students access to a wide range of graded readers, both at the 95% and 98% coverage thresholds, unless using the New GSL, which was much more limited in its ability to provide coverage. The median rank needed for sufficient coverage rises with each ERF level, no matter what list is used. There is also considerable overlap between levels, allowing learners to move between levels easily, as far as lexical requirements are concerned. These findings indicate that ERF levels incrementally guide learners towards more and more authentic language and texts. Similarly, the SEWK-J provides coverage of the majority of books, making it suitable for comparing a wide range of books together under the same framework. Differences between ERF levels in the SEWK-J ranks required to reach 95% and 98% were more less noticeable than those for the pedagogically focused frequency-based lists. Next, I investigated the degree to which publisher-declared headword counts are representative of the number of headwords in each graded reader. Using the headword ranges provided by publishers tends to overestimate the number of word types needed for 95% coverage, except at the lowest ERF level. If 98% coverage is expected, then a general trend towards underestimation was found at the lowest ERF levels. Following up on these single-word analyses, I then investigated the MWEs within the graded reader corpus to produce a list of the most frequent MWEs, which I compared with a large comparison corpus, the COCA. These results indicated that graded readers are a good source of 2-, 3-, 4-, and 5-grams, with more occurring in graded readers than the COCA. Next, I examined the degree to which the most useful MWEs were included, defined as being MWEs in the Phrasal Expressions List (PHRASE) (Martinez & Schmitt, 2012) list and Phrasal Verbs Pedagogical List (PHaVE) (Garnier & Schmitt, 2015). Graded readers tended to include the most pedagogically important MWEs and phrasal verbs at all ERF levels. Those PHRASE and PHaVE list items that were most common in the large reference corpora used in their creation were also found to be most common in the GRC, suggesting that graded readers are a good source of comprehensible input using these forms. Finally, using studies of L2 speaking and writing at different levels of proficiency as a guide (Siyanova-Chanturia & Spina, 2020; Tavakoli & Uchihara, 2020), I conducted an exploratory investigation into whether MWE usage in graded readers follows similar trajectories as graded reader difficulty levels increase. It was found that 2-grams that are infrequent and strongly associated in unsimplified text tend to become more common as ERF levels increase.
    ADA compliance
    For Americans with Disabilities Act (ADA) accommodation, including help with reading this content, please contact scholarshare@temple.edu
    Collections
    Theses and Dissertations

    entitlement

     
    DSpace software (copyright © 2002 - 2023)  DuraSpace
    Temple University Libraries | 1900 N. 13th Street | Philadelphia, PA 19122
    (215) 204-8212 | scholarshare@temple.edu
    Open Repository is a service operated by 
    Atmire NV
     

    Export search results

    The export option will allow you to export the current search results of the entered query to a file. Different formats are available for download. To export the items, click on the button corresponding with the preferred download format.

    By default, clicking on the export buttons will result in a download of the allowed maximum amount of items.

    To select a subset of the search results, click "Selective Export" button and make a selection of the items you want to export. The amount of items that can be exported at once is similarly restricted as the full export.

    After making a selection, click one of the export format buttons. The amount of items that will be exported is indicated in the bubble next to export format.