Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI. learn more
H2O.ai, an open source AI platform provider, today announced two new vision language models designed to improve document analysis and optical character recognition (OCR) tasks.
Named H2OVL Mississippi-2B and H2OVL-Mississippi-0.8B, the models offer competitive performance against much larger models from leading technology companies and are more efficient for companies with document-heavy workflows. may provide a solution.
David vs. Goliath: How H2O.ai’s small model outsmarts big tech companies
Although the H2OVL Mississippi-0.8B model has only 800 million parameters, it outperformed all other models on the OCRBench text recognition task, including models with billions of parameters or more. Meanwhile, the 2 billion parameter H2OVL Mississippi-2B model demonstrated strong general performance across a variety of visual language benchmarks.
“We designed the H2OVL Mississippi model to be a high-performance yet cost-effective solution to bring AI-powered OCR, visual understanding, and Document AI to your business,” said CEO and Founder of H2O.ai says Sri Ambati in an exclusive article. Venture Beat interview. “By combining advanced multimodal AI and efficiency, H2OVL Mississippi delivers accurate and scalable Document AI solutions to a variety of industries.”
The release of these models marks an important step in H2O.ai’s strategy to make AI technology more accessible. H2O.ai makes its models freely available on Hugging Face, a popular platform for sharing machine learning models, allowing developers and companies to modify them to suit their specific document AI needs. and adaptable.
H2O.ai’s new H2OVL Mississippi-0.8B model (far right, yellow) outperforms larger models from leading technology companies on text recognition tasks on the OCRBench dataset, making it the best choice for document analysis. It shows the potential for smaller, more efficient AI models. (Credit: H2O.ai)
Combining efficiency and effectiveness: a new approach to document processing
Mr. Ambati highlighted the economic advantages of small and specialized models. “Our approach to generative pre-trained transformers stems from our significant investment in Document AI, which works with customers to extract meaning from corporate documents,” he said. “These models can run anywhere, in a small footprint, efficiently and sustainably, allowing you to fine-tune domain-specific images and documents at a fraction of the cost.”
The announcement comes as companies seek more efficient ways to process and extract information from large volumes of documents. Traditional OCR and document analysis methods often struggle with low-quality scans, difficult handwriting, or heavily modified documents. H2O.ai’s new model addresses these issues while providing a more resource-efficient alternative to large language models that can be overkill for certain document-related tasks. The purpose is
Industry analysts say H2O.ai’s approach could disrupt the current landscape dominated by tech giants. By focusing on a smaller, more specialized model, H2O.ai could capture a significant portion of the enterprise market that values efficiency and cost-effectiveness.
Comparing average scores on eight single-image benchmarks, H2O.ai’s new H2OVL Mississippi-2B model (yellow) outperforms several competitors, including products from Microsoft and Google. This model trails only the Qwen2 VL-2B in overall performance among similarly sized vision language models. (Credit: H2O.ai)
Open Source and Enterprise Ready: H2O.ai’s AI Deployment Strategy
“At H2O.ai, making AI accessible isn’t just an idea; it’s a movement,” Ambati told VentureBeat. “We are expanding the possibilities for creating and using AI by releasing a series of small base models that can be easily tweaked for specific tasks.”
H2O.ai has raised $256 million from investors including Commonwealth Bank, Nvidia, Goldman Sachs, and Wells Fargo. The company’s open source approach and focus on practical, enterprise-ready AI solutions have helped build a community of more than 20,000 organizations and more than half of the Fortune 500 companies as customers.
As enterprises continue to grapple with digital transformation and the need to extract value from unstructured data, H2O.ai’s new vision language model looks to implement document AI solutions without the computational overhead of large-scale models. This may present an attractive option for businesses. The real test will be in real-world applications, but H2O.ai’s demonstration of competitive performance on a much smaller model suggests a promising direction for the future of enterprise AI.
VB Daily
Be sure to know! Get the latest news in your inbox every day
Thank you for subscribing. Check out other VB newsletters here.
An error has occurred.