newsweekshowcase.com

Scientists are using a blockbuster model of artificial intelligence

Nature: https://www.nature.com/articles/d41586-025-00275-0

Enhancing trust in artificial intelligence and ensuring provenance: A case study of Noah Hollman and Samuel Mller and Frank Hutter

More than 150 million examples of text and images are used to train the best-known LLMs. This enables them to answer user queries with a degree of reliability. But what if relevant real-world data do not exist in the required quantities? Can AI still provide reliable answers when trained on fewer data sets? Researchers use artificial intelligence to make predictions from data sets, which they can’t find enough of to train their models. If artificial intelligence were trained on randomly generated data, results could be achieved according to a Nature study.

Hollman and colleagues’ work is an example of necessity spurring innovation: the researchers realized that there were not enough accessible real-world data sets to train their model, and so they found an alternative approach.

Even though it seems that it has been cut down by Trump, Enhancing Trust in Artificial Intelligence, along with minimizing harms, must remain a global priority. The president has removed an executive order that called for theNIST and artificial intelligence companies to collaborate to improve the trust in and the safety of artificial intelligence. The new executive order ignores the word safety in favor of removing barriers to US leadership in artificial intelligence. Last November, NIST published a report on methods for authenticating AI content and tracking its provenance (see go.nature.com/42c21tn). Researchers should not allow these efforts to be wasted.

Synthetic data do not come free of risks, such as the danger of producing inaccurate results, or hallucinations. It’s important that these studies are replicated. Replication, a cornerstone of science, also reassures users that they can trust the results of their queries.

The work of Noah Hollman, Samuel Mller and Frank Hutter is part of this advance. Their model is called TabPFN and is designed to analyse tabulated data, such as those found in spreadsheets. Typically, a user creates a spreadsheet by populating rows and columns with data, and uses mathematical models to make inferences or projections from those data. TabPFN can make predictions on any small data set, ranging from those used in accounting and finance to those from genomics and neuroscience. Moreover, the model predictions are accurate even though it is trained entirely without real-world data, but instead on 100 million randomly generated data sets.

The DeepSeek-R1 model seems to match the o1 model, released in San Francisco, which is considered an industry leader in reasoning models.

It is not even past January, and 2025 is already proving to be a defining year for artificial intelligence (AI). On 21 January, just one day into his presidency, US President Donald Trump announced the Stargate Project, a joint venture between leading technology companies and financiers in the United States, Japan and the United Arab Emirates. They pledged $500 billion to the development of artificial Intelligence in the United States.

In preliminary tests of R1’s abilities on data-driven scientific tasks — taken from real papers in topics including bioinformatics, computational chemistry and cognitive neuroscience — the model matched o1’s performance, says Sun. The team challenged the models to complete 20 tasks from a set of problems called the ScienceAgentBench. These include tasks such as analysising and visualizing data. Both models solved only around one-third of the challenges correctly. It cost 13 times less to run R1 using the API, but it had a slower “thinking” time than o1, notes Sun.

According to a researcher at the Ohio State University, they think Deepseek-R1 will encourage more scientists to use LLMs even though it’s expensive. “Almost every colleague and collaborator working in AI is talking about it.”

Much of the excitement over R1 is because it has been released as ‘open-weight’, meaning the learned connections between different parts of its algorithm are available to build on. Fine tuning is the ability to improve performance through extra training, and can be achieved by scientists that download R1, or one of the smaller versions released by DeepSeek. Given a suitable data set, researchers could train the model to improve at coding tasks specific to the scientific process, says Sun.

R1 is making promise in mathematics. Frieder Simon , a mathematician and computer scientist at the University of Oxford, UK, challenged both models to create a proof in the abstract field of functional analysis and found R1’s argument more promising than o1’s. To benefit from the mistakes that models make, researchers will need to have the skills to tell a good and bad proof apart.

Exit mobile version