In the weeks following ChatGPT’s launch, students began to fear that they would use the chatbot in seconds to create passable essays. Startups have responded to these fears by creating products that can tell if text is written by a person or machine.
According to new research, which hasn't yet been peer-reviewed, it is relatively easy to fool these tools and avoid detection.
Debora WeberWulff is a professor of Media and Computing at the University of Applied Sciences (HTW Berlin). She worked with a team of researchers to evaluate the ability of 14 different tools, such as Turnitin GPT Zero and Compilatio to detect the text written by OpenAI ChatGPT.
These tools look for the hallmarks of AI generated text such as repetition and calculate the likelihood of the text being AI generated. The team discovered that the tools tested were unable to detect ChatGPT generated text that was slightly rearranged and obfuscated using a paraphrasing software. This suggests that students only need to slightly modify the essays that the AI generates in order to pass the detectors.
Weber-Wulff says, "These tools do not work." "They don't work as they claim to," says Weber-Wulff. "They're not AI detectors."
Researchers evaluated the tools by writing brief undergraduate-level essays in a wide range of subjects including civil engineering and computer science. They also wrote short essays on history, linguistics, literature, economics, linguistics, geography, and history. The researchers wrote the essays to ensure that the text was not already available online. This would have made it possible for ChatGPT to have already been trained using the essay.
Each researcher then wrote a second text in Bosnian (also known as Czech), German, Latvian (also known as Slovak), Spanish or Swedish. These texts were then translated into English using either DeepL, an AI translation tool, or Google Translate.
The team used ChatGPT, and generated two more texts. They tweaked them slightly to disguise the fact that they were AI-generated. The researchers edited one set manually, reordering sentences and changing words. Another was rewritten by an AI paraphrasing program called Quillbot. They had 54 documents in total to test their detection tools.
The researchers found that the tools performed well when identifying text generated by AI (with an average accuracy of 96%), but they did poorly when identifying AI-generated texts, particularly when edited. The tools were able to identify ChatGPT text at 74% accuracy. However, when the ChatGPT generated text was slightly edited, the accuracy dropped to only 42%.
Vitomir Kvanovic, senior lecturer at the University of South Australia who develops machine-learning and AI model, said that these types of studies highlight the outdated methods of assessing student work.
Daphne Ippolito is a Google senior researcher specializing on natural-language generation who did not work with the project. This raises another issue.
She says that if automatic detection systems will be used in educational settings, it's important to know their false-positive rates, because incorrectly accusing students of cheating could have serious consequences for their academic careers. The false-negative rate also matters, as if AI-generated text is mistaken for human-written, then the detection system will be useless.
Compilatio makes one of the tested tools. It says that it's important to remember its system only indicates suspicious passages.
Compilatio's spokesperson stated that it is up to schools and teachers to mark the documents to validate or impute to the author the knowledge they have actually acquired. For example, by using additional investigation methods such as oral questions, additional classroom questions, etc.
Compilatio's tools, in this way, are part of an authentic teaching approach which encourages the learning of good research, writing and citation techniques. The spokesperson explained that Compilatio is not a correction tool, but a software to aid in the correction of errors. Turnitin or GPT Zero didn't immediately respond to our request for comment.
Since a while, we've been aware that the tools designed to detect AI-written texts don't always perform as they should. OpenAI released a tool to detect ChatGPT text earlier this year. It admitted that the tool only flagged 26% of AI text as "likely AI written." OpenAI directed MIT Technology Review to a section in its website for educators that warns that tools to detect AI generated content are "far away from foolproof."
Tom Goldstein is an assistant professor from the University of Maryland who wasn't involved in the study.
He adds that "many are not highly accurate but they are also not all complete disasters either", pointing out how Turnitin was able to achieve some detection accuracy while maintaining a low false-positive ratio. While studies that shed light on the limitations of AI-text-detection systems are important, Sasha Luccioni says it would have helped to extend the study's scope to AI tools other than ChatGPT.
Kovanovic believes that the entire idea of attempting to detect AI-written texts is flawed.
He says, "Don't attempt to detect AI. Make it so the use of AI does not pose a problem."
————————————————————————————————————————————————————————————
By: Rhiannon Williams
Title: AI-text detection tools are really easy to fool
Sourced From: www.technologyreview.com/2023/07/07/1075982/ai-text-detection-tools-are-really-easy-to-fool/
Published Date: Fri, 07 Jul 2023 11:30:00 +0000
Leave a Reply