A silhouette office worker is replaced by computer code.
getty
As enthusiasm for generative AI continues to grow steadily, being deployed in our apps, devices, and businesses, and new tools and use cases are nearly brought to market, frontier models of billions and trillions of parameters are This has been the focus for the past two years. every day.
We also know that the rapid growth of large-scale AI models for language, voice, and video is putting a significant strain on resources. This has led to a resurgence of interest in nuclear power, with hyperscalers such as Microsoft, Google, and AWS making significant commitments to it. Nuclear power will support the construction of hundreds of billions of data center infrastructures expected over the next few years.
While models with hundreds of billions or trillions of parameters, such as those developed by researchers at OpenAI, NVIDIA, Google, and Anthropic, are state-of-the-art, these power-hungry next-generation models are often It has also been found to be much more powerful than the model. Required for most use cases. It’s like driving a Formula 1 race car in the middle of rush hour traffic.
This is where smaller models that can be driven with less energy and calculated horsepower come into play.
NVIDIA NIM and IBM Granite 3.0 offer a glimpse into the future of enterprise AI
Increasingly, we hear about small language models with hundreds of millions or less than 10 billion parameters that are highly accurate, consume significantly less energy and cost per token.
At the GTC conference in March of this year, NVIDIA announced its NIM (NVIDIA inference Microservice) software technology. It packages an optimized inference engine, industry standard APIs, and support for AI models into a container for easy deployment. In essence, NIM can handle larger models than smaller languages, but the idea is for an optimized container service with industry-specific models and APIs that can be used for visualization, game design, drug discovery, or code writing. can significantly simplify compute, data, models and frameworks, while also reducing the amount of computational power needed to run AI workloads. We believe the recently announced partnership between NVIDIA and Accenture is a great example of the combination of computing, industry-specific microservices, and expertise that enables rapid adoption of AI within the enterprise.
Last week, IBM announced its latest Granite 3.0 model. This is a family of small language models that has shown strong performance against small language models (7-8 billion parameters) such as Llama and Mistral. All three companies have developed flexible open source options that can be tailored and optimized for business use cases with incredible performance in areas such as math, languages, and code. Llama has been a staple of open source model development, but IBM’s rapid improvements are notable. I also see these advances as important because they offer open source products that can be used on clouds like AWS, but also on IBM’s own watsonx platform. Enterprise-centric companies like IBM, with their software, models, and extensive consulting, take into account the complexity of solving a set of use cases that often requires not just models but deep industry. , is an example of how an “AI for the enterprise” strategy can be effectively pursued. Expertise.
The head of it all is a combination of models and flexible infrastructure that allows companies to focus on outcomes-based AI projects that will enable the next wave of technological advancements such as agent AI, assistants and automation, and digital labor at scale.
Research continues, but the future will probably be a small model for businesses.
The idea that a one-size-fits-all model with trillions of parameters is the holy grail of enterprise AI fails in many ways. Of particular note is the energy consumption and cost per token in well-defined use cases where only a few parameters are actually required. If you have (at most) a billion parameters to manipulate, you’re better off running a small, specialized model tailored to your specific business use case. Additionally, data lineage is better understood and access to data is limited to only what is needed, rather than large models that require massive scale to deal with huge amounts of data. Easily manage and address the growing number of data security, privacy, and sovereignty issues. Use case description.
And there’s no question that we want to continue researching and building the world’s most sophisticated AI to support economic growth and help solve complex problems. However, for enterprises, smaller languages and underlying models have proven to be the better option for many business use cases, providing a more sustainable and fit-for-purpose solution while significantly reducing costs. ways to deploy AI at scale. A.I. This combination cannot and should not be ignored by companies looking to leverage the potential of generative and agential AI solutions.