IT Brief Australia - Technology news for CIOs & IT decision-makers
Story image
Databricks acquires AI-centric data governance platform, Okera
Fri, 5th May 2023

Databricks, the lakehouse company, has announced that it has entered into a definitive agreement to acquire Okera, the AI-centric data governance platform.

Okera aims to solve data privacy and governance challenges across the spectrum of data and AI. It simplifies data visibility and transparency, helping organisations understand their data, which is essential in the age of LLMs and to address concerns about their biases.

Ali Ghodsi, Co-founder and CEO at Databricks, comments, "We're thrilled to welcome Nong and the incredibly talented Okera team to Databricks. We look forward to incorporating Okera's core capabilities directly into the Databricks platform in the coming year, further enhancing the unified, AI-centric governance experience delivered by Unity Catalog."

He continues, "For a decade, Databricks has focused on democratising data and AI for organisations around the world. And since the debut of ChatGPT last November, and the recent introduction of Dolly 2.0, every customer has been asking us how they can leverage the power of AI and large language models (LLMs) in their businesses. Immediately following those questions, they ask about how they can protect the security and privacy of their data in this new world."

Nong Li, Okera's Co-founder and CEO, is widely known for creating Apache Parquet, the open source standard storage format that Databricks and the rest of the industry builds on.

Nong also played an instrumental role at Databricks earlier on. He led the vectorised Parquet effort and the codegen effort that resulted in Apache Spark 2.0s 10x performance improvement.

Li comments, "We founded Okera to help modern, data-driven enterprises accelerate legitimate data access while minimising data security risks and delivering regulatory compliance.

"As data continues to grow in volume, velocity, and variety across different applications, CIOs, CDOs, and CEOs across the board have to balance those two often conflicting initiatives not to mention that historically, managing access policies across multiple clouds has been painful and time-consuming."

He continues, "Many organisations dont have enough technical talent to manage access policies at scale, especially with the explosion of LLMs. What they need is a modern, AI-centric governance solution.

"We could not be more excited to join the Databricks team and to bring our expertise in building secure, scalable and simple governance solutions for some of the worlds most forward-thinking enterprises."

Okera's data governance platform offers two unique technologies that can address the challenges of data governance in this new world, according to Databricks.

First, Okera offers an AI-powered interface to automatically discover, classify, and tag sensitive data such as personally identifiable information (PII). These tags enable data governance stakeholders to easily assess compliance and create no-code access policies that improve visibility and control over data.

Okera also provides a self-service portal to quickly audit and analyse sensitive data usage, giving organisations the ability to reliably monitor and track data usage patterns. This helps ensure that governance policies are applied consistently, even in the explosion of data assets, many of which can be AI generated.

Second, Okera has been developing a new isolation technology that can support arbitrary workloads while enforcing governance control without sacrificing performance.

This technology is in private preview and has been tested by a number of joint customers specifically on their AI workloads. It is the key to ensure enterprises will be covering the whole spectrum of applications in the new world efficiently. 

The lakehouse is the best place to develop data and AI applications together, and to build LLMs, according to the company. Databricks lakehouse vision is centred around the unification of these workloads on one platform. At the foundation of our lakehouse vision lies Unity Catalog, the data governance layer for all data and AI workloads.

Databricks states its customers will benefit from being able to use AI to discover, classify and govern all their data, analytics, and AI assets (including ML models and model features) with attribute-based and intent-based access policies.

Additionally, they will benefit from end-to-end data observability on the lakehouse that allows them to centrally audit and report sensitive data usage across analytics and AI applications, and automatically trace data lineage down to the column level.

Ghodsi says, "With these enhancements, our customers will have a holistic view of their data estate across clouds and can use a single permission model to define access policies, accelerating AI use cases and ensuring consistent governance across the lakehouse.

"This forthcoming acquisition will also enable us to expose APIs for richer policies that other data governance partners can use, providing seamless solutions for our customers."