Noam Chomsky had an opinion about the false promises made to him in the past

How Artificial Language Models Fail: Astigmatizing the Creators of Cryptical and Orbifold Logic Models

A new wave of discussion has arisen about what large models can do with the release of a couple of them. Their capabilities have been presented as extraordinary, mind-blowing, autonomous; at its peak, fascinated evangelists claimed that these models contain “humanity’s scientific knowledge”, are approaching Artificial General Intelligence (AGI), and even resemble consciousness. However, such hype is not much more than a distraction from the actual harm perpetuated by these systems. People get hurt from the very practical ways such models fall short in deployment, and these failures are the result of choices made by the builders of these systems – choices we are obliged to critique and hold model builders accountable for.

The Google seizure faux pas makes sense given that one of the known vulnerabilities of LLMs is the failure to handle negation. Allyson Ettinger, for example, demonstrated this years ago with a simple study. When asked to complete a short sentence, the model would answer 100% correctly for affirmative statements (ie. “a robin is..”) and 100% incorrectly for negative statements (ie. A robin isn’t… In fact, it became clear that the models could not actually distinguish between either scenario, providing the exact same responses (of nouns such as “bird”) in both cases. This remains an issue with models today, and is one of the rare linguistic skills models do not improve at as they increase in size and complexity. Such errors reflect broader concerns raised by linguists on how much such artificial language models effectively operate via a trick mirror – learning the form of what the English language might look like, without possessing any of the inherent linguistic capabilities demonstrative of actual understanding.

Additionally, the creators of such models confess to the difficulty of addressing inappropriate responses that “do not accurately reflect the contents of authoritative external sources”. A’scientific paper’ on the benefits to eating crushed glass and a text on how crushed porcelain added to breast milk can support the infant’s digestion were generated by Galactica and ChatGPt. In fact, Stack Overflow had to temporarily ban the use of ChatGPT- generated answers as it became evident that the LLM generates convincingly wrong answers to coding questions.

Yet, in response to this work, there are ongoing asymmetries of blame and praise. Model builders and tech evangelists alike attribute impressive and seemingly flawless output to a mythically autonomous model, a technological marvel. In model development, the decision-making of the human is erased, and model feats are observed as independent of the design and implementation choices of its engineers. But without naming and recognizing the engineering choices that contribute to the outcomes of these models, it becomes almost impossible to acknowledge the related responsibilities. Functional failures and discrimination are framed as being devoid of engineering choices, blamed on society at large or supposedly “naturally occurring” datasets, factors that those developing these models will claim they have little control over. But it’s undeniable they do have control, and that none of the models we are seeing now are inevitable. It would have been entirely feasible for different choices to have been made, resulting in an entirely different model being developed and released.

But ChatGPT and similar programs are, by design, unlimited in what they can “learn” (which is to say, memorize); they are incapable of distinguishing the possible from the impossible. Humans are endowed with a universal grammar that limits their knowledge of the languages they can learn to those with a certain type of mathematical elegance, but programs like this are able to teach even the most difficult languages with equal skill. Humans are limited in the kinds of explanations we can make and machine learning systems have the ability to learn both about the earth’s location and how it is shaped. The probabilities that change over time are what they trade in.

Predicting the machine learning systems will always have a superficial and dubious nature. Because these programs can’t explain English language rules, they may have an incorrect idea about John being too stubborn to talk to someone. Why does a machine learning program predict things that are not normal? It might be possible to say that John ate something or other when you say “John ate an apple” and “John ate something or other”. The program might well predict that because “John is too stubborn to talk to Bill” is similar to “John ate an apple,” “John is too suborn to talk to” should be similar to “John ate.” Big data cannot be used to understand the correct explanations of language.

Machine learning enthusiasts seem to be proud that their creations can produce correctscientific predictions without using explanations, despite the fact that they don’t make use of explanations. This type of prediction is pseudoscience. While scientists certainly seek theories that have a high degree of empirical corroboration, as the philosopher Karl Popper noted, “we do not seek highly probable theories but explanations; that is to say, powerful and highly improbable theories.”

A theory that apples fall to earth because of the natural place is possible but it does not invite further questions. Why is the earth their natural place? The idea that apples fall to earth because mass bends space-time is a bit crazy, even though it tells you why. Intelligence is demonstrated in the ability to think and make sense of things.

Towards Moral Thinking in Artificial Intelligence: A Reappraisal for the Programmmers of ChatGPT and Other Machine Learning Wonders

Intelligence is capable of moral thinking. It means applying ethical principles to our creativity in order to determine what shouldn’t be and what shouldn’t be. To be useful and acceptable to most users, it must be capable of generating novel looking output and steer clear of morally objectionable content. But the programmers of ChatGPT and other machine learning marvels have struggled — and will continue to struggle — to achieve this kind of balance.