newsweekshowcase.com

Data Centres have a huge water footprint

Nature: https://www.nature.com/articles/d41586-023-03316-8

How AI is changing science in Nature’s collection (from 1 November): The Science of Science Stops Deception by Showcasing Artificial Intelligence

A debacle at OpenAI has highlighted concerns that commercial forces are acting against responsible development of AI. The company that built thechatg pt fired Sam Altman, its co- founder and chief executive, on 17 November only to bring him back five days later. The push to retain dominance is leading to toxic competition, according to Sarah Myers West. She is concerned that products are appearing before anyone knows how they are being used. “We need to start by enforcing the laws we have right now,” she says.

A picture of the pope wearing a huge jacket went global, and many people did not know it was created with artificial intelligence. Scientists are working to stop deceptive deepfakes. Read this and other stories on how AI is changing science in Nature’s collection (from 1 November). The illustration was created by Seor Salme.

AI tools, methods and data generation are advancing faster than institutional processes for ensuring quality science and accurate results. The scientific community must take urgent action, or risk wasting research funds and eroding trust in science as AI continues to develop.

How to make a 3D printer with artificial intelligence: an inkjet-type printer with fake data from the language model GPT-4

An error-correcting 3D printer can create designs such as a robotic hand with soft plastic muscles and rigid plastic bones, in one go. It is difficult to combine different materials in the same run. This inkjet-type printer builds 3D structures by spraying layer after layer of material. If there is an accident, it keeps an electronic eye on it and will compensate them in the next layer. The removing of messy mechanical smoothing removes limits on the materials that can be used.

The Japanese habit of showing respect to non- living things, includingAI, demonstrates one path towards a better relationship with our gadgets, argues anthropologist Shoko Suzuki. (Nature Human Behaviour | 5 min read)

Water resources in sub-Saharan Africa could face pressure as a result of sloth computing hubs. Plus, GPT-4 generates fake data set to support bogus science and what the OpenAI drama means for AI progress — and safety.

The large language model GPT-4 can be used to produce fake data to support a scientific claim. The data from the machine learning program suggested that one procedure is better than the other for treating an eye condition. The two lead to similar outcomes in real trials. Although the data don’t hold up to close scrutiny by authenticity experts, “to an untrained eye, this certainly looks like a real data set”, says biostatistician Jack Wilkinson.

Chess puzzles tend to get stumped when computers are used to solve them. The researchers tried to create ten different versions of AlphaZero, each with their own strategies. A ‘virtual matchmaker’ algorithm decides which agent has the best chance of succeeding. The system was able to solve more chess puzzles than AlphaZero alone: the artificial brainstorming session “leads to creative and effective solutions that one would miss without doing this exercise”, says AI researcher Antoine Cully.

Towards a Fair Data Set for Artificial Intelligence Research: Is Your Data Efficient? Be sure to consult with your experts before submitting your data

Only two of the FAIR criteria are met, as far as the deposited research data is concerned: they are findable and accessible. Interoperability and reusability requires sufficient information to allow data sets to be combined reliably, which is particularly important for artificial intelligence studies.

There are more than 25,000 earth and space scientists from more than 100 countries who attend the annual conference of the American Geophysical Union. In the year 2015 there were less than 100 submissions, but then in the year 2016 the number jumped to more than 1,200.

Researchers need to understand the training and input data sets used in the model. This includes any biases, especially when the model contributes to actions such as disaster responses, preparation, investments, or health-care decisions. Data sets that are poorly thought out or insufficiently described increase the risk of ‘garbage in, garbage out’ studies and the propagation of biases, rendering outcomes meaningless or, even worse, dangerous.

  1. Risk. If data sets are susceptible to biases, consider and manage how they might affect the outcome or have unforeseen consequences.

The American Geophysical Union facilitates the distribution of more detailed recommendations in the community report and they are arranged into modules for ease of use.

Some areas have better coverage or fidelity of environmental data than others. Areas that are often under cloud cover, such as tropical rainforests, or that have fewer in situ sensors or satellite coverage, such as the polar regions, will be less well represented. Similar disparities across regions and communities exist for health and social-science data.

The abundance and quality of data sets are known to be biased, often unintentionally, towards wealthier areas and populations and against vulnerable or marginalized communities, including those that have historically been discriminated against7,8. In health data, for instance, AI-based dermatology algorithms have been shown to diagnose skin lesions and rashes less accurately in Black people than in white people, because the models are trained on data predominantly collected from white populations8.

When data sources are combined, they can cause worse problems, as it is necessary to provide actionable advice to the public. Assessing the impact on the health of communities of air pollution as well as economic, health or social-science data is dependent on environmental data.

Unintended harmful outcomes can occur when confidential information is revealed, such as the location of protected resources or endangered species. Worryingly, the diversity of data sets now being used increases the risks of adversarial attacks that corrupt or degrade the data without researchers being aware11. Artificial intelligence and machine learning tools are easy to use, but can be difficult to detect. There are public data sets made up of images and other content that can contain noise or interference. This can alter a model’s outputs and the conclusions that can be drawn. If outcomes from one model serve as input for another, they will increase the value of the model and also increase the risks.

In publications, researchers should clearly document how they have implemented an AI model to allow others to evaluate results. Running comparisons across models and separating data sources into comparison groups are useful soundness checks. It is necessary for standards and guidance to be provided so that an assessment comparable to statistical confidence levels can accompany outputs. This could be a key to their continued use.

Researchers and developers are working to make the behavior of Artificial Intelligence systems more understandable to users. In short-term weather forecasting, for example, AI tools can analyse huge volumes of remote-sensing observations that become available every few minutes, thus improving the forecasting of severe weather hazards. Humans must be able to assess the validity and usefulness of the forecasts and decide whether to use the outputs in other models or alert the public.

Xai attempts to quantify or visualize which input data featured more or less in reaching the model’s outputs in any given task. Researchers should check the explanations to make sure they are reasonable.

There should be specialists for each type of data and members of the community who might be affected by research outcomes in order to form a research team. One example is an artificial intelligence project that combines Traditional Knowledge from Indigenous people in Canada with data collected by non-Indigenous people to find areas that were best suited for farming.

The movement across scientific fields to report the study data, code and software in accordance with the guidelines of FAIR is already under way. Data and code should be cited in reference sections of the primary research paper, in line with data-citation principles. This is welcome, as are similar directives from funding bodies, such as the 2022 ‘Nelson memo’ to US government agencies (see go.nature.com/3qkqzes).

Data for papers published in the AGU5 shows that deposition policies were implemented in 2019. Most publication-related data has been deposited in two generalist repositories: Zenodo and figshare. (Figshare is owned by Digital Science, which is part of Holtzbrinck, the majority shareholder in Nature’s publisher, Springer Nature.) Many institutions maintain their own generalist repositories, again often without discipline-specific, community-vetted curation practices.

Disciplinary repositories, as well as a few generalist ones, provide this service — but it takes trained staff and time, usually several weeks at least. The data deposition needs to be planned well in advance of the acceptance of the paper.

Source: Garbage in, garbage out: mitigating risks and maximizing benefits of AI in research

Managing research repositories in the framework of multi-institutional and state-of-the-art research collaborations: a case study

Sustained financial investments from funders, governments and institutions — that do not detract from research funds — are needed to keep suitable repositories running, and even just to comply with new mandates16.

Exit mobile version