The Impact of GENAI on Research and Development: Implications for the European Parliament and For the Responsibility of Scientists and the Artifacts of Research
Since ChatGPT’s arrival in November 2022, it seems that there’s no part of the research process that chatbots haven’t touched. Generative AI (genAI) tools can now perform literature searches; write manuscripts, grant applications and peer-review comments; and even produce computer code. Yet, because the tools are trained on huge data sets — that often are not made public — these digital helpers can also clash with ownership, plagiarism and privacy standards in unexpected ways that cannot be addressed under current legal frameworks. And as genAI, overseen mostly by private companies, increasingly enters the public domain, the onus is often on users to ensure that they are using the tools responsibly.
Ed Newton-Rex, a former AI executive who now runs the ethical AI nonprofit Fairly Trained, calls opt-outs “fundamentally unfair to creators,” adding that some may not even know when opt-outs are offered. He thinks it’s good to see the DPA calling for opt-ins.
Academics have no say in the way their data is used or have them learn from existing models. It is more difficult to prosecute misuse of published papers that are open access as compared to the misuse of a piece of art or music. Zhao says that most opt-out policies “are at best a hope and a dream”, and many researchers don’t even own the rights to their creative output, having signed them over to institutions or publishers that in turn can enter partnerships with AI companies seeking to use their corpus to train new models and create products that can be marketed back to academics.
The policy gives broad exemptions to products used in research and development so the impact on academia is likely to be minimal. The European Parliament member hopes the law will have trickle-down effects on transparency. New requirements under the act will include an account of how their models are trained and how much energy they use, as well as opt-out policies that will have to be followed. The act could result in a huge fine for any group that violates it.
The act is an acknowledgement of the emergence of a new reality in which Artificial intelligence is present. “We’ve had many other industrial revolutions in the history of mankind, and they all profoundly affected different sectors of the economy and society at large, but I think none of them have had the deep transformative effect that I think AI is going to have,” he says.
Academics often sign their IP over to institutions or publishers, giving them less leverage in deciding how their data are used. But Christopher Cornelison, the director of IP development at Kennesaw State University in Georgia, says it’s worth starting a conversation with your institution or publisher if you have concerns. The entities could be better placed to broker a deal with a company that’s likely to violate the law. He says that the expectation is that they are working towards a common goal, and we don’t want an assertive relationship with our faculty.
Scientists can now detect whether visual products, such as images or graphics, have been included in a training set, and have developed tools that can ‘poison’ data such that AI models trained on them break in unpredictable ways. “We basically teach the models that a cow is something with four wheels and a nice fender,” says Ben Zhao, a computer-security researcher at the University of Chicago in Illinois. Zhao worked on one such tool, called Nightshade, which manipulates the individual pixels of an image so that an AI model associates the corrupted pattern with a different type of image (a dog instead of a cat, for example). There are not available similar tools for poisoning writing.
Specialists broadly agree that it’s nearly impossible to completely shield your data from web scrapers, tools that extract data from the Internet. However, there are some steps — such as hosting data locally on a private server or making resources open and available, but only by request — that can add an extra layer of oversight. Several companies, including OpenAI, Microsoft and IBM, allow customers to create their own chatbots, trained on their own data, that can be isolated in this way.
Abstaining from using genAI might feel like missing out on a golden opportunity. But for certain disciplines — particularly those that involve sensitive data, such as medical diagnoses — giving it a miss could be the more ethical option. In order to use these models in the health-care setting, we need to have better ways of making them forget, according to one of the philosophers who studies ethics of digital technologies.
Other publishers, such as Wiley and Oxford University Press, have brokered deals with AI companies. Taylor & Francis has an agreement with Microsoft that is worth US$10 million. The CUP has not yet entered partnerships, but is still developing policies that will offer an ‘opt-in’ agreement to authors. In a statement to The Bookseller magazine discussing future plans for the CUP — which oversees 45,000 print titles, more than 24,000 e-books and more than 300 research journals — Mandy Hill, the company’s managing director of academic publishing, who is based in Oxford, UK, said that it “will put authors’ interests and desires first, before allowing their work to be licensed for GenAI”.
The journals of Springer Nature, the American Association for the advancement of Science, the PLOS and Elsevier have not entered such agreements, even though some of them do. Nature is not published by Springer but is editorially independent.
Two studies this year have shown evidence of widespread use of artificial intelligence to write scientific manuscripts and peer-review comments, in spite of the fact that publishers try to place a limit on the use of artificial intelligence. According to legal scholars who spoke to Nature, when academics use bot in this way, they open themselves up to risks that they don’t fully understand. Ben Zhao, a computer security researcher at the University of Chicago in Illinois, says that people using models have no idea what they are capable of and wish they would take precautions.
An Openai spokesman said the company was looking into ways to improve the opt-out process. “As a research company, we believe that AI offers huge benefits for academia and the progress of science,” the spokesperson says. “We respect that some content owners, including academics, may not want their publicly available works used to help teach our AI, which is why we offer ways for them to opt out. We’re also exploring what other tools may be useful.”
AI companies are increasingly interested in developing products marketed to academics. Several companies have released search engines with artificial intelligence. In May, OpenAI announced ChatGPT Edu, a platform that layers extra analytical capabilities onto the company’s popular chatbot and includes the ability to build custom versions of ChatGPT.
“There’s an expectation that the research and synthesis is being done transparently, but if we start outsourcing those processes to an AI, there’s no way to know who did what and where the information is coming from and who should be credited,” he says.
A computer scientist at the University of Montreal, Poisot studied the world’s biodiversity and achieved a successful career. A guiding principle for his research is that it must be useful, Poisot says, as he hopes it will be later this year, when it joins other work being considered at the 16th Conference of the Parties (COP16) to the United Nations Convention on Biological Diversity in Cali, Colombia. “Every piece of science we produce that is looked at by policymakers and stakeholders is both exciting and a little terrifying, since there are real stakes to it,” he says.