Should ChatGPT Be Banned in Schools?

As 2023 dawns, the hot topic in education circles is the artificial intelligence (AI) tool ChatGPT and its use in schools and universities.

Early last month, New York City’s Department of Education banned its use on school devices and networks. Last week, Seattle Public Schools joined the bandwagon, banning ChatGPT and six other potential “cheating sites.” Soon after, Sciences Po, one of France’s top universities, announced “without transparent referencing, students are forbidden to use the software for the production of any written work or presentations, except for specific course purposes, with the supervision of a course leader,” though it did not specify how it would track usage.

On the other hand, a group of professors from the University of Pennsylvania argued that “banning artificial intelligence-driven chatbots is a practical impossibility, so teachers should consider ways to embed them into the learning process.” In their view, banning ChatGPT is like prohibiting students from using Wikipedia or spellcheckers: “It’s hard to believe that an escalating arms race between digitally fluent teenagers and their educators will end in a decisive victory for the latter.”

The Pennsylvania professors are correct when they say “AI is not coming. AI is here. And it cannot be banned. So, what should we do?”

First, it is important to understand what these tools are and what they can and cannot do. To be sure, they are capable of generating coherent answers, but while the output is plausible, is it credible?

ChatGPT is an artificial text generator, the latest in a long line of work in natural language processing (NLP). It is quite sophisticated, capable of taking a wide range of input prompts and generating coherent text output in response. It creates its responses based on probabilistic combinations of the vast array of text on which it was “trained,” leading some scholars to describe tools like it as “stochastic parrots.” Its outputs are capable of defeating standard plagiarism detectors, such as Turnitin, because the text generated is truly original—or at least not written verbatim elsewhere. But originality is no guarantee of the quality of an answer to a question.

The quality of ChatGPT outputs is a function of the amount of data inputs used in its creation, and these are vast. Building and training the model has also been an expensive exercise, using large amounts of computer time (and power). The resource costs of making incremental changes to its knowledge base stand as a limiting factor. It is not like a search engine, scanning all available data at the time a question is posed to create its output; it draws its responses from a fixed set of inputs at a given point in time (November 2022 in the current version). So it cannot provide credible output on new and rapidly developing topics, because these cannot have been in its training set.

The quality of its output also depends on the precision of the prompt. For general prompts on well-settled matters, it can provide some remarkably credible outputs. When I asked it to provide a curriculum for an undergraduate operations management course, it provided a classic set of topics that one could find as the chapter headings of virtually every available textbook on the subject. But when asked to provide a referenced academic article on a highly specific topical research subject, the output was garbage. Nicely written and (apparently) correctly referenced, but, nonetheless, garbage. As ChatGPT is not a search engine, the articles “cited” did not actually exist. The responses contained the names of some reputable scholars in the field (and many that were fake), but the references were “created” for the responses. Neither did the responses capture the complex nuances of the current debate on the topic. This suggests that for now, the tool is good for high-level, rote-learning exercises on well-known topics, but it will struggle when given a complex question requiring critical thinking on current matters. But later versions will inevitably get better.

The challenge for educators is therefore to revisit their methods of teaching and assessment. Regarding assessment, written work is cheap to grade, but it is now harder to attribute authorship. If we are to truly assert that our students have mastered core learning objectives, the value of face-to-face interactive and interpersonal assessment increases (something of which Socrates was very much aware). Ironically, NLP tools undermine the business case for cheap, massive online learning courses, because credible assessment is no longer cheap.

Nonetheless, there are many ways in which NLP tools may assist students with their learning. Both educators and students need to be aware of the tools’ distinctions—as well as those tools’ strengths and limitations. Then there will be less to fear from them and (hopefully) less misuse of them in educational contexts.

Related