
LLMs have sparked a whirlwind of excitement and debate. Universities accused students of cheating using CHATGPT. The abilities of what humans can do came into question. Skeptics said LLMs only generate text, while proponents stated that they could revolutionize knowledge work.
In the last couple of years, LLMs have achieved a lot. They have been writing code, drafting human-like prose, summarizing extensive reports, and more. The efficiency gains are telling! People embraced tools like CHATGPT with open arms. CHATGPT achieved 100 million users within two months of launch (making ChatGPT the fastest-growing consumer app in history).
So, are LLMs truly revolutionary or just inflated hype?
LLMs as a Revolution in AI
Proponents laud LLMs’ revolutionary abilities in advancing artificial intelligence. The impact and use cases derived from LLMs have significant real-world benefits and economic potential.
LLMs are intelligent.
Modern LLMs can perform complex language tasks at a level that often surprises experts. For example, OpenAI’s GPT-4 has demonstrated human-level performance on certain benchmarks. It can even pass a simulated bar exam in the top 10% of test-takers.

Reasoning models think before they answer. Using a sophisticated chain of thought mechanism, reasoning models take a while before they answer. These represent a significant leap in LLMs' ability to answer complex queries.
LLMs are solving real-world problems.
In education, for instance, the nonprofit Khan Academy has piloted a GPT-4-powered tutor to give students personalized help, illustrating the potential to revolutionize learning. In software development, tools like GitHub Copilot and Blackbox are helping programmers code faster by autocompleting functions and suggesting fixes.
For marketing and copywriting, too, companies are using LLMs to draft marketing copy, summarize documents, and power chatbots for customer service. LLM-based assistants are also being deployed in medicine (to summarize patient notes or suggest possible diagnoses) and law (to streamline contract analysis).
The sheer speed of adoption is telling. It conveys that LLLMs solve real user needs and deliver value across domains.
LLMs are being prioritized by investors.
Microsoft expects to spend USD 80 Billion in the fiscal year 2025. Investors are confident about the abilities of Gen AI companies. Open AI, for instance, raised Soft Bank-led USD 40 Billion at a valuation of USD 300 Billion.
Google and Meta are also accelerating efforts to advance LLM research. It’s equally important to note that there are now competitors that are not ‘Big Tech.’
Deepseek is a prime example in this context. This Chinese startup caused a crash in the stock market because it introduced a powerful LLM called R1 that was capable of competing with CHATGPT and Llama—all at a fraction of the cost incurred by tech giants.
We are also expecting returns on AI investments. According to Morgan Stanley Research, Gen AI revenue will likely increase 20-fold in the next three years and reach USD 1.1 trillion by 2028.
LLMs could unlock great economic potential.
Beyond anecdote, early analyses suggest LLMs could unlock enormous economic value.
Morgan Stanley calls AI a $6 trillion opportunity, and McKinsey Global Institute estimates generative AI could add $2.6 to $4.4 trillion in annual productivity globally across industries.
These gains would come from automating routine tasks, improving software development efficiency, accelerating research, and augmenting human workers in countless roles.
Already, real productivity improvements are being reported. For example, customer support agents using AI chatbots to draft responses have seen significant boosts in throughput. Such figures fuel the view that LLMs are not just academic curiosities, but a revolutionary force for business and society at large.
LLMs: The Skeptical View
The other side of the proverbial coin paints a picture that prompts us to be cautious of the current landscape of LLMs.
LLMs aren’t reliable.
Computational linguist Emily M. Bender called large language models “stochastic parrots,” arguing that they merely mimic language without understanding meaning. She cautions that when an LLM’s answer is correct, it may be mostly by luck: “We have to keep in mind that when [an LLM’s] output is correct, that is just by chance,” Bender says. “You might as well be asking a Magic 8-ball.”
Simply put, these models lack a grounded grasp of reality or facts. They cannot verify truth, reason abstractly, or guarantee consistency. This leads to the well-documented issue of hallucinations: LLMs often generate confident-sounding statements that are completely false.
For example, an LLM might cite non-existent research or give detailed but made-up answers to questions. A dramatic real-world case occurred in mid-2023, when a pair of lawyers submitted a court brief written by ChatGPT that included six bogus case citations (entirely fake legal precedents the AI invented). The lawyers, who said they hadn’t imagined the AI would just fabricate cases “out of whole cloth,” were sanctioned for filing false information.
This incident underscores how unreliable LLM outputs can be if taken at face value, and why skeptics liken trusting an LLM without verification to trusting an articulate but clueless parrot.
LLMs still lack reasoning capabilities.
LLMs have improved their reasoning capabilities through techniques like chain-of-thought prompting, retrieval-augmented generation (RAG), and fine-tuning with human feedback.
Despite newer models' ability to solve more complex math problems, navigate logic puzzles, or even perform multi-hop reasoning across documents, they still lack a grounded understanding of the physical world or an internal model of causality.
Apple's recent GSM-Symbolic study highlights that current LLMs, despite their apparent sophistication, are still best described as artificial narrow intelligence systems. Artificial narrow intelligence describes AI systems that excel at specific, well-defined tasks but lack human intelligence's flexible, adaptive reasoning and generalization abilities.
So even though LLMs excel at pattern recognition within familiar contexts, they lack the genuine logical reasoning and adaptability required for general intelligence.
While LLMs can greatly assist in drafting, summarization, and hypothesis generation, they remain unreliable as sole decision-makers in high-risk domains like medicine, law, and finance.
LLMs can perpetuate bias and spread misinformation.
Another set of criticisms centers on the ethical and societal risks of unleashing LLMs at scale. Because they are trained on huge internet datasets, LLMs can inherit and amplify biases or offensive content present in that data.
Without rigorous safeguards, a model might produce sexist, racist, or disinformation outputs that reflect unsavory aspects of its training corpus.
There have been cases of chatbots going off the rails when users intentionally or unintentionally provoke them with specific prompts.
Naveen Kumar, an associate professor at the University of Oklahoma's Price College of Business, has co-authored a study that calls attention to addressing bias with the help of ethical, explainable AI. "As international players like DeepSeek and Alibaba release platforms that are either free or much less expensive, there is going to be a global AI price race," Kumar said.
Addressing these concerns is crucial as LLMs are set to play a bigger role in finance, talent acquisition, marketing, and healthcare. Bias has no place in such sensitive contexts. Another study suggests that generative language could exhibit social identity biases.
LLMs are opaque.
Many state-of-the-art LLMs are proprietary systems trained on undisclosed data. This makes it difficult for outside experts to consistently verify the claims and benchmark models. A Stanford study found that the lack of standardization in responsible AI reporting for LLMs makes the comparison a bit tricky. Each company tests its model’s safety and limits differently, making comparisons difficult.
Such opacity hinders accountability. For instance, if a major player releases a newer and advanced version of an AI model without limited technical details for competitive and safety reasons, it causes concerns for the research community. If the performance of LLMs cannot be independently assessed or reproduced, some argue that bold claims should be met with healthy skepticism until validated.
There are also concerns that current evaluation metrics don’t capture the full scope of LLM behavior (for instance, a model might score high on a benchmark test but still make nonsensical errors in conversation). All these factors temper the “breakthrough” narrative by reminding us that LLMs remain unpredictable and not fully understood even by their creators.
LLMs could be overhyped.
Skeptics also point to signs that the LLM boom has elements of a hype bubble. The harsh skeptics compare the rise of LLMs with the dot-com bubble. Since tech history is fraught with hype cycles, it’s only fair to make such assumptions, especially when many businesses started branding themselves with ‘AI’ and urged venture capitalists to jump on the trend.
Gartner’s report perfectly explains where Generative AI lies on the spectrum of the hype cycle of emerging technologies. It’s placed between the ‘Peak of Inflated Expectations’ and ‘Trough of Disillusionment.’

Generative AI could face a correction if it fails to meet the lofty promises made on its behalf. We’re already seeing some pullback – e.g., certain early enterprise adopters have paused deployments after finding that LLMs didn’t perform as expected, or due to privacy concerns with sensitive data.
Researchers Widder and Hicks argue that even if the “AI hype bubble slowly deflates,” it can still have harmful lasting effects , such as wasted resources, disrupted labor markets, and erosion of trust.
These critics highlight that current LLMs:
(a) lack genuine understanding,
(b) often produce incorrect or biased outputs,
(c) are being over-credited with capabilities they do not yet have,
(d) are surrounded by a hype-driven narrative that could mislead investors, policymakers, and the public.
Perception Matters: Implications for Funding, Regulation, and Trust
How enterprises balance the optimistic and skeptical views of LLMs will influence the trajectory of AI development. The perception is that Generative AI is the door to the next big frontier. We are now the horizon of Agentic AI, where we can build autonomous solutions like agentic workflows.
Funding
Funds are being poured into sustaining and scaling LLMs, which shows the strong belief in the technology's potential. More funds allow researchers to have breakthroughs, like solving hallucinations or improving context understanding.
However, overinflated expectations carry a risk: a backlash could occur if the ROI on these hefty investments doesn’t materialize fast enough. There’s talk in some circles of an “AI bubble.” Venture capital may have overfunded AI startups in 2023, and 2025 could see a correction or crash if those startups struggle to deliver on hyperbolic promises.
Regulation
On the AI regulations front, governments feel pressure to develop strategies and guardrails for using LLMs. The European Union accelerated work on its AI Act, imposing requirements on high-risk AI systems, including transparency obligations for generative models. Other countries are following suit.
Trust
Moreover, the public’s trust in AI is delicate and can be swayed by hype and adverse incidents. Surveys show many people have mixed feelings about AI’s growing role. Most Americans (52%) say they are more concerned than excited about AI in daily life – a percentage that has grown in recent years. Only a small minority (about 10%) are primarily excited by AI’s advances. This wariness stems from fears of job displacement, privacy loss, or simply the unsettling idea of machines doing “human” things.
We must take a balanced approach to LLMs. If you’d like to discuss how LLMs could be useful for your enterprise, connect with us.