Artificial Intelligence and the Politics of Science: Why AI-drafted policy briefs are better than LLM tools, and how to best protect them
Answers to these questions are urgently needed. Powerful language models are used widely in research and technological development and can now be obtained through commercialization and open source. Policymakers are experimenting with publicly available artificial intelligence tools. Legislative staff members in the United States are experimenting with OpenAI’s GPT-4 (see go.nature.com/3zpwhux) and, reportedly, other unapproved and potentially less reliable AI tools. The US House of Representatives set limits on the amount of information that could be communicated through chatbot in June.
Concerns have been raised that AI-drafted publications could flood the system and contaminate databases of preprints and journal submissions14. There could be a risk for policy information from tools based on artificial intelligence. Infusing political debates with biased or fabricated information — presented in a seemingly scientific manner — could create confusion and tip the perception of contested policy issues. Targeting policymakers with disinformation can be an effective strategy to divert attention and cause confusion. Disinformation attacks pose a threat not only to those that provide science advice, but to the whole online system as well. Ensuring that systems used to produce science advice are not influenced by disinformation or ‘data-poisoning’ attacks might require greater oversight and understanding of the training data and process. This is a whole-of-sector issue, but in the first instance, coordinating and advisory bodies (such as the US Office of Science and Technology Policy) should work with key research funders to have a catalysing role.
We do not propose that policy briefs be drafted by LLM-based tools in their entirety, but AI could be used to facilitate parts of the process. Human reviewers and policy designers still have an essential part to play in creating policy papers, providing crucial quality control that ensures credibility, relevance and legitimacy. Yet, as generative AI tools improve, they could be used to provide first drafts of discrete sections, such as plain-language summaries of technical information or complex legislation.
Automated Evidence Synthesis in Health and Medicine: The Case of Cochrane Library and the Use of Machine Learning for Business Intelligence
Current evidence searches are time-consuming and involve a lot of judgement. Hard-pressed science advisers must take what they can get. But what if the searches could be more algorithmic?
Systematic reviews — such as Cochrane reviews in health and medicine — identify a question of interest and then systematically locate and analyse all relevant studies to find the best answer (see www.cochranelibrary.com). For example, one recent review examined evidence on whether healthy-eating initiatives were successful in young children, finding that they can be, although uncertainties remain2.
As publishers develop other analytical tools in their databases, they might also create their own evidence-synthesis tools, but these will be limited by the scope of their coverage. It could limit access in low- and middle-income countries that do not have the money to pay for them if these tools are only developed by the private sector. Large-scale automated evidence synthesis requires access and interoperability of databases and government collaboration.
Increasingly, machine learning can automate the search, screening and data-extraction processes that form the early stages of systematic reviews5. For example, LLMs such as Semantic Scholar’s TLDR feature can summarize large corpuses of text — a handy feature for sifting the scientific literature. AI tools could be especially useful in making sense of emerging domains of research, in which review papers and disciplinary journals might be lacking. Natural language processing and graph Algorithms can be used to find emerging clusters of research in the broader literature. Nonetheless, assessing data quality and drawing conclusions from the amassed evidence still typically require human judgement.
A number of automated processes can be used to help decision-making. The solution scanning process can be created by artificial intelligence tools. Take, for example, policies for reducing shoplifting. When prompted to list potential policy options, ChatGPT can identify topics such as employee training and store layout and design. Advisers can come to a decision on the relevant evidence in these areas. Some options will be missed and other options will be found that conventional approaches wouldn’t. The policy question and context might affect which dimensions of credibility are most important.
Automation would also address another common problem: limited language skills. Science advisers who speak English have it easy, because it is the main language of science. But there is a great deal of policy-relevant literature in other languages. One analysis8 of the biodiversity conservation literature revealed that more than one-third of papers were published in languages including Spanish, Portuguese, Chinese and French. Advisers who are constrained by a language barrier should be able to get global information from the use of artificial intelligence and machine translation.
Many journals use standardized formats for reporting study results but it is not always the same. International agencies, non-governmental organizations, industry and working papers are just some of the different sources of information. It’s difficult to develop automated methods to identify specific findings in a presentation of such diversity. It is important to know the effect size over a specific period, but this can be kept out of the text. It could be helpful if results and the research methodology are presented in a more consistent manner. For instance, in medical and life-sciences research, journals published by Cell Press use a structured reporting format called STAR Methods (see go.nature.com/3ptjqcf).
Currently, conducting systematic reviews requires searching across databases — mostly proprietary ones — to identify relevant scientific literature. The choice of a database can have a large impact on the outcome. It’s possible that requirements for the government to publish funded research as open access makes it easier to retrieve results. The creation of evidence databases and alignment with copyright law will be ensured by eliminating paywalls for research topics that Governments deem as funding priorities.
In one experiment by the publisher Elsevier, an LLM system was constructed that referenced only published, peer-reviewed research. Although the system managed to produce a policy paper on lithium batteries, challenges remain. The resulting text was bland and pitched at a high level of understanding, which was very different to the original synthesis the paper was based on, and far from the briefs needed. Some important design principles were demonstrated by this system. For instance, forcing it to generate only text that refers to scientific sources ensured that the resulting advice credited the scientists who were cited.
Say a POSTnote was commissioned by the UK Parliament to summarize the latest research on COVID-19 vaccines. Instead of a single publication, POST could produce a multilayered document that automatically tailored itself to different politicians. A version of this can be sent to a politician in which they can show how people in their district contributed to the science of vaccine manufacturing or COVID-19. Targeted information on the rate of infections in their own region could be provided to them.
The level of scientific explanation of how vaccines work, and another dimensions are possible. Science-savvy politicians could receive specialist knowledge; those with no scientific background could receive a lay version. The level of technical detail can be changed by the reader.
Artificial Intelligence and its Relevance in Government and Government Sectors: Preparing for the G20 Summit in London, November – June
The political leaning of different language models on both the social and economic fronts has been shown by researchers. Some of these biases are picked up from the data that models are trained on. These biases can then have implications for how models perform on specific tasks, such as detecting hate speech and misinformation13. Race, religion, gender and more are forms of bias.
Such processes should be conducted by institutions with mechanisms in place that ensure robust governance, broad participation and public accountability. For example, national governments could build on current efforts, such as the US What Works Clearinghouse and the UK What Works Network. The tools could be developed by international bodies, such as the UN scientific and cultural organization UNESCO. Care should be taken to seek international collaboration between countries of all income levels. It’s crucial to ensure the availability of these tools and science information to low-income countries, but it’s also crucial that there’s a consistent system for evidence synthesis that meshes with national and international priorities.
Policy briefings often contain classified or other sensitive information, such as the details of a defence acquisition or draft findings from a public-health study, which needs to remain private until cleared for public dissemination. If advisers use publicly available tools, they might be at risk of revealing restricted information, a concern that has already complicated artificial intelligence deployment elsewhere in the government and in the private sector. Guidelines need to be established for what documents can be fed into external LLMs and for internal models to be developed on secure server.
In November and in London in June, the issues of artificial intelligence should be discussed. Other events include the G20 Chief Science Advisers’ Roundtables, and major conferences such as those hosted by the International Network for Government Science Advice and the European Parliament Technology Assessment network. Early consideration of which sector takes the lead will be important, given the concerns being voiced over regulatory capture and government competence in the broader AI domain. There could be consensus on issues at the summits.
Around the world, artificial intelligence is used in science education. Students at schools and universities regularly use LLM tools to answer questions, and teachers are starting to recognize that curricula and methods of pedagogy will need to change to take this into account.
Artificial intelligence has also been changing. In the past few years, there has been a rush to develop machine-learning Algorithms that can help to discern patters in scientific data sets, but the 2020s have brought a new age of generative Artificial Intelligence tools which are pre-trained on vast data sets.