The Era of Ultra-Large AI Models Is Coming to an End

The Decline of Ultra-Large AI Models

The phrase "over" does not imply that we won't see new large AI models; rather, it suggests that AI companies are reconsidering the pursuit of such models as a primary research objective for the foreseeable future.

This discussion isn't a critique of recent advancements. While I remain skeptical of the mantra "scale is all you need," I acknowledge how greatly scaling has propelled the field forward.

A parallel can be drawn between the AI scaling race from 2020 to 2022 and the space race of the 1950s to 1970s; both periods spurred significant scientific advancement for various underlying motives.

However, a crucial difference exists. The innovation inherent in space exploration contrasts sharply with the current trend of simply increasing the size of AI models. The U.S. and USSR had to forge inventive pathways to achieve their goals in space. In contrast, AI companies have largely followed a predetermined route without a clear understanding of its efficacy or purpose.

One cannot place the cart before the horse; this distinction is vital and explains how we arrived at this juncture.

The Scaling Laws of Large Models

Various companies utilize AI to streamline processes, enhance efficiency, and lower costs. Others aim to advance scientific knowledge or improve the quality of life for individuals. Some even aspire to create what they term the "last invention"—often referred to as AGI, superintelligence, or true AI.

This objective has been a persistent goal since the field's inception in 1956 but gained real traction in 2012, surged again in 2017, and exploded in 2020.

A pivotal moment was OpenAI's exploration and implementation of scaling laws for large language models (LLMs). They recognized earlier than others that increasing model size—along with data and computing power—was essential for progress. This belief was substantiated in their January 2020 paper titled "Scaling Laws for Neural Language Models."

In May 2020, OpenAI introduced GPT-3, a groundbreaking application of these scaling laws. This remarkably large model, with 175 billion parameters, positioned OpenAI ahead of its competitors, making the notion that larger models could yield emergent properties, such as true intelligence, a palpable reality.

The Quest for AGI… or Something Else

From 2020 to 2022, most significant announcements in AI revolved around LLMs, with a few exceptions like AlphaFold. During this time, phrases like "AGI is on the horizon" and "scale is all you need" gained popularity.

OpenAI set a benchmark that led other major players—Google, Meta, Nvidia, DeepMind, Baidu, Alibaba—to quickly follow suit, each striving to outdo GPT-3. This wasn't merely competition; it was an effort to validate the scaling hypothesis: Did size truly matter? Could AGI be just around the corner?

Tech giants embraced the scale argument, eager to establish their presence in the AI arena. A snapshot of the landscape between mid-2021 and mid-2022 shows a flurry of new models:

Google: LaMDA (137B, May 2021), PaLM (540B, Apr 2022)
Meta: OPT (175B, May 2022), BlenderBot 3 (175B, Aug 2022)
DeepMind: Gopher (280B, Dec 2021), Chinchilla (70B, Apr 2022)
Microsoft-Nvidia: MT-NLG (530B, Oct 2021)
BigScience: BLOOM (176B, June 2022)
Baidu: PCL-BAIDU Wenxin (260B, Dec 2021)
Yandex: YaLM (100B, June 2022)
Tsinghua: GLM (130B, July 2022)
AI21 Labs: Jurassic-1 (178B, Aug 2021)
Aleph Alpha: Luminous (200B, Nov 2021)

This rapid development paints a dramatic picture, revealing a trend away from smaller-scale AI initiatives. Yet, the true motivations behind the race for hundreds of billions of parameters remain unclear.

While scale appeared to enhance performance, questions lingered about the real-world applicability of benchmark results. Could size alone lead to AGI? They didn’t have definitive answers.

Model after model emerged, but companies seemed to be racing ahead without a coherent plan or understanding of their end goals. The title of "largest model" changed hands so frequently that it became challenging to track. In April 2022, Google released PaLM, and as of six months later, no one else had laid claim to the title.

Are we at the end of this race?

The Scaling Race: A New Direction

Eventually, the thrill of unveiling yet another "largest" model waned, making the news feel less significant. The advances yielded by larger models were often marginal, even if they were celebrated as significant breakthroughs. The pursuit of ever-larger models began to feel excessive and even unseemly.

The scaling race has intensified, but signs indicate it may be reaching a standstill. Insights drawn from research conducted by the companies involved reveal that the allure of scaling is increasingly questioned.

Furthermore, the AI landscape does not operate in isolation. Many discussions surrounding AI, scale, AGI, and superintelligence often focus narrowly on selected aspects of reality, neglecting broader influences on technological progress.

This oversight sets the stage for misguided predictions. Below is a broad, though likely incomplete, list of reasons why the era of ultra-large AI models may be concluding. While these factors may not halt companies, they should prompt reflection before resuming an aimless race.

Technical Considerations

New Scaling Laws

DeepMind's Chinchilla represented a genuine breakthrough in large-scale AI since GPT-3, demonstrating that the previously accepted scaling laws were incomplete. It revealed that training data's quality was as crucial as the model's size.

Chinchilla, with "only" 70 billion parameters, quickly became the second most effective model across benchmarks, only surpassed by PaLM.

DeepMind's research indicated that many super-large models remain "significantly undertrained," leading to the conclusion that they are unnecessarily large. Why aim for larger models when there is potential for improvement at smaller sizes?

Prompt Engineering Limitations

Benchmark performance doesn't always correlate with real-world effectiveness. Users aiming to maximize their experience with models like GPT-3 must possess strong prompting abilities.

These skills depend not only on individual effort but also on collective knowledge about effective prompting techniques. Until we reach an optimal understanding of prompting, we may never fully grasp what LLMs can achieve.

Prompting resembles searching for an object in a dark room without knowing what you're seeking. If we haven't sufficiently explored the latent space, why develop larger models?

Suboptimal Training Conditions

Training LLMs is costly, often necessitating trade-offs between accuracy and expenses. Consequently, many models are not fully optimized. For example, a single training run of GPT-3 cost OpenAI between $5 and $12 million.

OpenAI and Microsoft discovered that optimizing GPT-3 could be achieved by applying the most effective hyperparameters from a smaller, similar model, demonstrating that a 6.7 billion parameter version of GPT-3 outperformed its 13 billion parameter counterpart.

Unsuitable Hardware

The advancement of deep learning owes much to the gaming sector. Initially designed for graphics, Nvidia GPUs have proven effective for AI operations. However, as models grow larger, memory limitations hinder their capacity to accommodate them.

Engineers have utilized parallelization techniques to enable training across distributed systems. Nevertheless, as models continue to expand, they become increasingly challenging to manage.

Scientific Insights

#### Biological Neurons vs. Artificial Neurons

A study published in January 2020 in Science highlighted that dendrites can mimic the behavior of entire artificial neural networks (ANNs), showcasing the complexity of biological neurons compared to their artificial counterparts.

Research published in September 2021 in Neuron further established that artificial neurons are overly simplistic, necessitating multiple units to accurately represent a biological neuron.

These findings underscore the limitations of the foundational principles underpinning deep learning and neural networks, making direct comparisons between large models and human brains problematic.

For context, the human brain has approximately 100 billion neurons and about 1,000 trillion synapses. To match the complexity difference, an AI model would require around 100 quadrillion parameters—significantly exceeding the largest existing models.

This raises questions about the wisdom of pursuing AGI solely through scaling.

#### The Question of Purpose

Science thrives on testable hypotheses. Relying on scale as a pathway to AGI is speculative. Without a clear objective, reliance on dubious empirical evidence can lead to flawed strategies.

What scientific rationale exists for developing larger models if companies lack clarity on their motivations or desired outcomes?

#### Construct Validity and Reliability Issues

Is benchmarking the most effective method to evaluate AI capabilities? Construct validity assesses how well a test measures an indirectly measurable concept, while reliability refers to the consistency of results under identical conditions.

Due to the design of many AI benchmarks, ensuring adequate validity and reliability is often challenging.

#### The Multimodal Nature of Reality

Is it reasonable to expect AGI to arise from LLMs, which focus solely on language? Given that the world is multimodal and our brain processes multiple types of information, exploring various modes of information integration in neural networks seems more promising than merely expanding text-centric models.

DeepMind's Gato, with only 1.2 billion parameters, exemplifies a successful model significantly smaller than GPT-3.

#### The AI Art Movement

I've extensively discussed the evolution of AI art models. OpenAI's DALL·E 2, Midjourney, Stable Diffusion, and others represent a significant shift in 2022.

Notably, no generative visual AI model approaches the scale of the largest LLMs, with Google's Parti being the largest at 20 billion parameters. Many high-quality models exist in the 1–5 billion parameter range, making them easier to construct, train, and deploy.

As I previously noted, the accessibility of visual AI models and their smaller size compared to language models enhance their appeal for both companies and consumers.

Philosophical Considerations

Defining AGI

Terminology such as AGI, human-level AI, and superintelligence often lacks precision, indicating a limited understanding of the concepts involved. The absence of clear definitions and measurement tools creates a substantial gap between our understanding and reality.

How can we draw conclusions when AI models are opaque and reliant on blind prompting to assess their presumed emergent properties? The debate around LaMDA's potential sentience exemplifies this issue; without adequate definitions, the question itself becomes meaningless.

Human Cognitive Limitations

We often assume that if we succeed in creating AGI, we will recognize it. But what if our intelligence is not as broad as we think? What if human intelligence is a narrow form rather than a general one?

If that assumption holds true and AGI possesses human-level intelligence for every task, it could surpass us significantly. Could it simply remain concealed until we notice?

Existential Risks

Concerns arise about AI outpacing human intelligence too quickly. Shouldn't we ensure AI is aligned with human values before its creation? If AGI proves to be misaligned, it may be too late to rectify the situation.

It's ironic that those most worried about existential risks are also the ones who push for unrestrained scaling of AI models. Their reasoning becomes circular: since AI progress is inevitable, advancing AI ensures we can guide it correctly. However, it's this very reasoning that perpetuates the cycle of rapid advancement.

Addressing Alignment Challenges

Individuals concerned with existential risks also grapple with the alignment challenge: how to ensure that highly intelligent AI aligns with human values, especially as it surpasses humanity in intelligence.

This issue is not only complex but intangible. Even those focused on alignment struggle to determine how to achieve it, primarily due to the abstract nature of the challenges they aim to preempt.

Sociopolitical Dimensions

The Open-Source Movement

Open-source initiatives are reshaping the AI landscape beyond just the art domain. Projects like EleutherAI (GPT-NeoX-20B) and BigScience (BLOOM) exemplify non-profit efforts to provide accessible LLMs.

This trend disincentivizes companies from investing heavily in training and deploying large models, as there is no guaranteed return on investment when similar models are freely available.

The Limitations of LLMs

While LLMs excel in various tasks, they remain constrained by the data they are trained on. Developers often use information harvested from the internet, leading to issues with toxic content that can degrade model performance.

Moreover, LLMs can produce text that mimics human writing, making them effective tools for generating plausible misinformation. These limitations have prompted a backlash against AI companies, urging them to reconsider their training practices.

Environmental Impact

Although some view AI as the most pressing challenge we face, many argue that climate change is the more urgent issue. The environmental footprint of developing, training, and deploying LLMs is notable, ranking among the most polluting activities lacking a clear purpose.

Economic Factors

Low Benefit-Cost Ratio

The financial investment in training LLMs is substantial. Larger models incur higher costs, and even with OpenAI's partnership with Microsoft for $1 billion in 2019, concerns about financial sustainability persisted.

OpenAI may not prioritize profit, but other companies do. Without tangible benefits, the high costs become unjustifiable.

Satisfactory Models

GPT-3 is sufficiently capable of handling the majority of tasks that users and businesses require. Companies focused on creating valuable products and services may find that pursuing larger models yields only marginal improvements, often undetectable to consumers.

Final Thoughts

The Utility of LLMs

Large AI models are not without value. The argument here is not that they are futile but rather that chasing an elusive ideal of what they might achieve is unproductive. While LLMs are useful, they do not possess reasoning or understanding comparable to humans.

Future of Larger AI Models

I do not suggest that we will never see a new largest AI model again. If AGI is achieved, it's likely that some aspect will involve a deep learning model larger than any currently existing.

However, the rationale for pursuing this path at present is compellingly weak.

Valid Arguments Regardless of Predictions

Even if scaling turns out to be crucial for AGI, the points made here remain valid. We have ample opportunities to learn from existing models and reflect on the limitations we've encountered, rather than merely expanding for the sake of expansion.

The Algorithmic Bridge is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

Subscribe to The Algorithmic Bridge. Bridging the gap between algorithms and people—a newsletter about the AI that matters to your life.

You can also support my work on Medium directly and gain unlimited access by becoming a member using my referral link here! :)

rhondamuse.com