X
Innovation

Microsoft Azure gets 'Models as a Service,' enhanced RAG offerings for enterprise generative AI

The annual developer conference, Microsoft Build, saw a slew of upgrades to Azure AI Services - with many in preview already.
Written by Tiernan Ray, Senior Contributing Writer
Microsoft sign at Build
Sabrina Ortiz/ZDNET

At its annual Build developer conference on Tuesday, Microsoft unveiled several new capabilities of its Azure AI Services within its Azure cloud computing business, with a focus on generative artificial intelligence

The new capabilities range from enabling greater database access to automatically dubbing videos into multiple languages to rapid training of large language models to understand complex document structures. Most of the innovations and enhancements are already in preview mode this week.

It starts with Studio 

Perhaps most relevant to most developers, Microsoft has enhanced its integrated development environment for AI, Azure AI Studio

To tie together all the pieces that go into making a cloud application on Azure, Microsoft has amplified what's called Azure Developer CLI, a set of templated commands used to deploy applications to the cloud. A feature in preview lets a developer "create resources in copilot sample repositories and facilitate large language model operations (LLM Ops) as part of continuous integration/continuous delivery (CI/CD) solutions to accelerate code-to-cloud workflows."

Also: Microsoft Build is this week - here's what to expect, how to watch, and why I'm excited

Another upcoming feature of Azure AI Studio is what Microsoft calls "Models as a Service," where a selection of large language models will be invokable by programmers as an API without having to manage the GPU infrastructure for the models, said Microsoft.

Rapidly understand complex documents

Microsoft is also introducing a new type of AI model called "custom generative," a way to rapidly develop a language model to process complex documents by using templates to define the structure of a document. The approach reduces the number of "labels" a developer needs to craft, the metadata that teaches an AI model about the various fields in a document.

"The model will use large language models (LLMs) to extract the fields, and users will only need to correct the output when the model does not get a field right," said Microsoft. 

Also: Every Copilot+ PC Microsoft just announced to take on Apple's M3 MacBooks

Custom generative is going into preview "soon," said Microsoft. 

A slew of additional features are meant to facilitate the multiple parts of creating a generative AI app, from "prompt flow" to the tracing and debugging to the stats on Gen AI once put into production.

Expanded use of RAG

For enterprises wanting to base large language models on their own data, both to particularize the results of queries and also to avoid hallucinations, the company has updated its Azure AI Search. The service is based on what's called retrieval-augmented generation, or, "RAG," a widespread practice of looking for the answers to a prompt in a database instead of just searching the most recent prompts. 

New capabilities for Azure AI Search include enhancements to the way the service scores results stored as "vectors," compressed representations of data that are suited to LLMs. The changes "give customers more options and flexibility to improve the accuracy of their responses," said Microsoft.

The service also adds the ability to turn images, not just text, into vectors, to make it easier for LLMs to retrieve images for a query response.

Also: 3 AI features coming to Copilot+ PCs that I wish were on my MacBook

It will also be easier to connect the Azure AI Search service to corporate data by using Fabric, the data analytics platform unveiled at Build last year, through a connector that routes data contained in the OneLake data lake, also unveiled last year. 

Microsoft emphasizes the ability to scale the RAG function with large vector sizes and expanded storage capability in Azure AI Search.

The functions are available in Azure AI Search in a preview form now.

Database enhancements

While RAG is useful in and of itself, most companies will need to retrieve data with a combination of traditional database retrieval methods. In a blog post, Shireesh Thota, who is corporate vice president in charge of Azure Databases, writes in a blog post that: "As AI applications become more mainstream, seamless database management is paramount. Trusted solutions that can scale limitlessly and autonomously, respond fast, and offer unparalleled flexibility and reliability will shape the future of coding."

Also: 6 ways AI can help launch your next business venture

For that reason, Microsoft has added to its database offerings features that are critical to large language model deployment: "vector search," which allows the compressed representation of content to be indexed and more easily retrieved; and "embeddings," a means to compress input data for a language model on the front end so it can be stored in a way familiar to the database. 

Azure Cosmos DB for NoSQL extends the Azure Cosmos database to perform vector search. Microsoft says it makes Cosmos the first cloud database with "lower latency vector search at cloud scale without the need to manage servers."

Azure Database for PostgreSQL in-database embedding updates the Azure implementation of the venerable PostgreSQL database so it can automatically compress input data into representations the LLM understands.

On-ramp to app development 

Several new offerings are meant to standardize the way generative AI apps are developed. They include "patterns and practices for private chatbots," a collection of reference implementations that Microsoft says enable enterprises to "create private chatbots that are reliable, cost-efficient and compliant."

The chatbot templates are available now. 

New multimodality

No developer AI conference would be complete without some new models. Microsoft unveiled an addition to its "Phi" family of language models, introduced a year ago. Phi are made to be small, meaning, having a not-very-large amount of parameters, or, neural "weights," so that they could be employed on "edge" devices such as a PC. A new version, Phi-3-Vision now supports performing queries on images. 

"Sized at 4.2 billion parameters, Phi-3-vision supports general visual reasoning tasks and chart/graph/table reasoning," said Microsoft. 

In addition to the Phi updates, Microsoft announced availability of OpenAI's newest large language model, GPT-4o, introduced last week, in Azure AI Studio in preview form. Studio also gains OpenAI's GPT-4 "Turbo with vision capabilities," which was rolled out last month by OpenAI. GPT-4 with vision, said Microsoft, "introduces a new dimension to AI apps, enabling the creation of content that spans across text, images and more, for a richer user experience. 

"This, along with the new capabilities of Microsoft Azure AI Enterprise Chat (previously known as On Your Data) integrated with retrieval-augmented generation (RAG), marks the beginning of an era for multimodal AI apps, providing developers with the tools to build more intuitive and interactive solutions. This update is now generally available."

Customizing the guardrails

Microsoft's rolling out tools to allow organizations to tweak what kinds of guardrails are imposed on generative AI. "Custom Categories" lets the developer create filters of their choosing to particularize content restrictions. "This new feature also includes a rapid option, enabling you to deploy new custom filters within an hour to protect against emerging threats and incidents," said Microsoft. 

Another feature, in preview, "prompt shields," is meant to block jailbreak attacks against large language models, which can often be achieved via simply crafting a prompt in a clever way.

Both capabilities are part of Microsoft's Azure AI Content Safety offering

Speak the speech

Following the path of models such as Google's Gemini and OpenAI's GPT-4o, Microsoft is placing greater emphasis on giving voice to programs. Two capabilities in preview include analytical tools to survey audio and video data for things such as sentiment analysis, and a video dubbing service that can automatically translate a video into multiple languages. 

Editorial standards