Data Protection Document Management Hybrid Cloud

Rubrik expands Annapurna for AI-ready unstructured data

Fri, 12th Jun 2026 (Today)

Rubrik has expanded its Annapurna platform to prepare unstructured data for artificial intelligence use. The system scans and catalogues data across distributed environments.

The update centres on a layer designed to let organisations work with unstructured data without first moving large volumes of files into separate data lakes. Annapurna can publish a queryable catalogue of file metadata into a lakehouse, allowing data teams to pull only selected subsets for model training, fine-tuning and inference.

Unstructured data accounts for about 90 per cent of most enterprise environments, according to Rubrik. Poor visibility across file estates has left much of that information siloed and inaccessible to data science and AI applications, while conventional extract, transform and load processes have added cost and delay.

Rather than copying raw files into a new environment, Annapurna is designed to discover, scan and index data in place across network-attached storage, S3 and other object stores. The system can process billions of files and publish metadata into a lakehouse within hours rather than weeks, Rubrik said.

The move reflects a wider effort across the technology sector to reduce the complexity of preparing enterprise data for AI systems. Many companies have struggled to make use of large stores of documents, images and other unstructured files because those datasets often sit across older infrastructure, different business units and regulated environments.

Rubrik argues that existing approaches have required organisations to duplicate entire data estates, then spend months engineering pipelines to identify the small portion of information actually needed for AI workloads. That can leave businesses paying to move and store large volumes of unused data.

"For years, the model to make unstructured data usable for AI meant to move, transform, and store it twice, while paying for the whole estate just to use a fraction," said Anneka Gupta, Chief Product Officer at Rubrik.

"Annapurna completely inverts that model. It activates data right where it lives, delivers only what Data Intelligence platforms actually need and aligns infrastructure costs to consumption. That is how enterprises truly scale AI," Gupta said.

How it works

The platform operates on Rubrik Security Cloud, the company's management plane. Annapurna extends that environment into an unstructured data layer that can sit alongside existing storage and lakehouse systems without requiring new infrastructure or software agents.

A central element of the design is its lakehouse integration. Instead of handing over full file repositories, Annapurna publishes a metadata catalogue that downstream applications can query. Engineers can then identify exact file targets and stage only the datasets needed for a specific workflow.

Rubrik said this creates what it describes as demand-driven pipeline economics, with costs linked to the amount of data pulled into AI workflows rather than the size of the full storage estate. That is intended to appeal to enterprises trying to contain spending as AI projects move from pilots to wider deployment.

Another part of the update focuses on governance and compliance. Annapurna preserves source-system access controls inside the catalogue so downstream platforms can continue enforcing permissions, addressing a common issue in traditional data pipelines where access rights can be stripped during transfers, according to Rubrik.

Each file staged into its managed object store also carries lineage and versioning records from source through to AI output. Rubrik said those provenance functions support compliance efforts, including requirements related to the General Data Protection Regulation.

Customer view

Piper Sandler is among the organisations Rubrik cited as using the approach in a highly regulated environment. Financial services groups have often faced particular difficulty in preparing distributed data for AI because of the controls required around sovereignty, retention and access.

"In financial services, managing petabytes of highly distributed, regulated, and siloed unstructured data across legacy and modern platforms was operationally limiting," said Corey West, Chief Technology Officer at Piper Sandler & Co.

"Annapurna provides an automated approach to map, govern, and index our estate for AI initiatives, reducing the friction of cross-functional configurations and data sovereignty requirements without needing another ETL stack or compromising our compliance posture," West said.

Rubrik said Annapurna is available now for qualified enterprise partners. Native lakehouse connector support is planned for later releases.

ChatGPT

Key takeaways Explain why it matters Create action plan Future watch

Claude

Key takeaways Explain why it matters Create action plan Future watch

Perplexity

Key takeaways Explain why it matters Create action plan Future watch

Grok

Key takeaways Explain why it matters Create action plan Future watch

Share Share

Add us as a preferred source on Google

Image: Supplied