newsweekshowcase.com

Why scientists have to monitor the use of generative artificial intelligence

Wired: https://www.wired.com/story/fast-forward-ai-powerful-secretive/

Living Guidelines for Generative AI – Why Scientists Should Oversee Their Use (articles/D41586-023-03266-1)

However, most scientists don’t have the facilities or funding to develop or evaluate generative AI tools independently. Only a handful of university departments and a few big tech companies have the resources to do so. Microsoft has invested $10 billion in OpenArtificial Intelligence and its software, which was trained on hundreds of billions of words. Companies are unlikely to release details of their latest models for commercial reasons, precluding independent verification and regulation.

Artificial intelligence systems are capable of making videos that can be indistinguishable from real people. In the long run, such harms could erode trust between people, politicians, the media and institutions.

  1. The body should develop quality standards and certification processes for generative AI tools used in scientific practice and society, which cover at least the following aspects:• Accuracy and truthfulness;• Proper and accurate source crediting;• Discriminatory and hateful content;• Details of the training data, training set-up and algorithms;• Verification of machine learning (especially for safety-critical systems).

It’s unclear whether self-regulation or legal restrictions will work in the long run. AI is advancing at breakneck speed in a sprawling industry that is continuously reinventing itself. Regulations drawn up today will be obsolete by the time they become official policy, and might not anticipate future harms and innovations.

Source: Living guidelines for generative AI — why scientists must oversee its use

Guidelines for living AI research funding proposals and how to implement them with a societal and organizational perspective on openness, integrity and reproducibility of research

  1. If you have used a generative tool a lot in your work, it is advisable to use a different one.

The 12th. Research funding organizations should always be involved in human assessment when evaluating research funding proposals, even if they rely on generative artificial intelligence.

Guidelines were developed with the help of several others, please see Supplementary information for co-developers.

The committee could measure and manage the risks like the US National Institute of Standards and Technology does in the AI Risk Management Framework. This requires close communication with the auditor. For example, living guidelines might include the right of an individual to control exploitation of their identity (for publicity, for example), while the auditing body would examine whether a particular AI application might infringe this right (such as by producing deep fakes). An AI application that fails certification can still enter the marketplace (if policies don’t restrict it), but individuals and institutions adhering to the guidelines would not be able to use it.

  1. The organization and body should at least include, but not be limited to, experts in computer science, behavioural science, psychology, human rights, privacy, law, ethics, science of science and philosophy (and related fields). It should assure, through the composition of the teams and the implemented procedures, that the insights and interests of stakeholders from across the sectors (private and public) and the wide range of stakeholder groups are represented (including disadvantaged groups). Composition of the team may change over time.

The FDA is an example of a body that assesses evidence from clinical trials to approve products that meet its standards. The Center for Open Science, an international organization based in Charlottesville, Virginia, seeks to develop regulations, tools and incentives to change scientific practices towards openness, integrity and reproducibility of research.

These approaches are applied in other fields. The Stroke Foundation in Australia has adopted living guidelines which allow patients to access new medicines quickly. The foundation now updates its guidelines every three to six months, instead of roughly every seven years as it did previously. Similarly, the Australian National Clinical Evidence Taskforce for COVID-19 updated its recommendations every 20 days during the pandemic, on average5.

The Center for Open Science6 created theTOP guidelines for promoting open science practices. A metric called TOP Factor allows researchers to easily check whether journals adhere to open-science guidelines. A similar approach could be used.

Scientific auditing bodies, guidelines and financial investment: the Club of Rome as an interdisciplinary expert group to help promote scientific research independence in the face of challenges

Financial investments need to be made. The auditing body will be the most expensive part because it has less computing power than a large university consortium. Although the amount will depend on the remit of the body, it is likely to require at least $1 billion to set up. That is roughly the hardware cost of training GPT-5 (a proposed successor to GPT-4, the large language model that underlies ChatGPT).

To scope out what’s needed, we call for an interdisciplinary scientific expert group to be set up in early 2024, at a cost of about $1 million, which would report back within six months. A group should sketch scenarios for how the auditing body and guidelines committee would function.

There might be investment from the public purse. Tech companies should also contribute, as outlined below, through a pooled and independently run mechanism.

At first, the scientific auditing body would have to operate in an advisory capacity, and could not enforce the guidelines. However, we are hopeful that the living guidelines would inspire better legislation, given interest from leading global organizations in our dialogues. The Club of Rome is a research organization that raises environmental and societal awareness and has an impact on international legislation for limiting global warming, even without direct political or economic power.

Tech companies could fear that regulations will hamper innovation, and might prefer to self-regulate through voluntary guidelines rather than legally binding ones. For example, many companies changed their privacy policies only after the European Union put its General Data Protection Regulation into effect in 2016 (see go.nature.com/3ten3du).However, our approach has benefits. Public trust can be fostered through auditing and regulation.

These benefits could provide an incentive for tech companies to invest in an independent fund to finance the infrastructure needed to run and test AI systems. However, some might be reluctant to do so, because a tool failing quality checks could produce unfavourable ratings or evaluations leading to negative media coverage and declining shares.

In a field dominated by the tech industry there is a challenge to maintain scientific research independence. Its membership must be managed to avoid conflicts of interests, given that these have been demonstrated to lead to biased results in other fields7,8. A strategy to deal with these issues needs to be developed.

A Stanford Study Looks at OpenAI Language Models: How Open is the Model, What It Does, Where It Was Built, and How It Was Used

When OpenAI published details of the stunningly capable AI language model GPT-4, which powers ChatGPT, in March, its researchers filled 100 pages. They also left out a few important details—like anything substantial about how it was actually built or how it works.

That was no accidental oversight, of course. OpenAI and other big companies are keen to keep the workings of their most prized algorithms shrouded in mystery, in part out of fear the technology might be misused but also from worries about giving competitors a leg up.

The Stanford team looked at 10 different AI systems, mostly large language models like those behind ChatGPT and other chatbots. GPT-4 from OpenAI, PaLM 2 from Google, and Titan Text from Amazon are examples of popular commercial models. The report looked at models offered by the startups, among them: Claude 2 from Anthropic, Command from Cohere andinflection-1 from Inflection.

Stable Diffusion 2, an image-generation model which was released by Meta in July, was examined as an example of an open source model that can be downloaded for free. (As WIRED has previously covered, these models are often not quite as open as they might seem.)

The Stanford team scored the openness of these models on 13 different criteria, including how transparent the developer was about the data used to train the model—for example, by disclosing how it was collected and annotated and whether it includes copyrighted material. The study also looked for disclosures about the hardware used to train and run a model, the software frameworks employed, and a project’s energy consumption.

Across these metrics, the researchers found that no model achieved more than 54 percent on their transparency scale across all these criteria. Overall, Amazon’s Titan Text was judged the least transparent, while Meta’s Llama 2 was crowned the most open. But even an “open source” model like Llama 2 was found to be quite opaque, because Meta has not disclosed the data used for its training, how that data was collected and curated, or who did the work.

Exit mobile version