At the International Conference on Foreign and Second Language Acquisition in Szczyrk, Krystyna Warchał and Dariusz Jakubowski presented the results of their recent study Degree-related writing before and after the public release of ChatGPT: A pilot corpus study of BA and MA theses by L2 English majors. This contribution pilots a project exploring how degree-related writing and research practices have changed following the public availability of LLMs.
The study draws on a corpus of 240 English-language BA and MA theses by L2 English majors in linguistics or literary studies and submitted to the Faculty of Humanities at a large public university in Poland. The material is organised into datasets representing two timeframes: 2021-2022 and 2024-2025, each comprising 60 high- and 60 low-graded theses. The analysis addresses the following questions: (i) What linguistic differences can be observed between theses completed before and after the public release of ChatGPT? (ii) Are there any patterns distinguishing high-graded and passing grade theses across the two periods? What are these patterns?
Focusing on lexical choice, syntactic complexity, and indicators of textual coherence and logical relations, the analysis employs LancsBox (Brezina & Platt, 2024), Sketch Engine (Kilgarriff et al., 2014), and an automated linguistic text analysis tool developed for this project. The findings indicate significant lexical differences between texts produced in the two timeframes. Beyond lexical differences, keyword analysis points to possible differences in syntactic patterns (evidenced by specific frame markers, reporting verbs, non-finite clauses, and signals of noun phrase complexity) and preferred indicators of cohesion and logical relations (as evidenced by summary words and conjuncts). These differences are more strongly marked in low-graded than in high-graded texts. Measurements applied in the automated text analysis demonstrate that more recent texts are more lexically diverse than those submitted in the years 2021-2022. Again, this difference is more strongly marked in low-graded theses, which are more lexically diverse than high-graded texts. The measurements consistently indicate that low-graded MAs are a particularly affected group.

Leave a Reply