
DataHub is addressing the issue of AI agents struggling with context by launching a new tool called Context Intelligence. The system analyzes existing SQL query logs to build a semantic index, making it easier for AI agents to find the right data. It connects to platforms like MCP, LangChain, and Google’s Agent Development Kit, using the same infrastructure DataHub has used for years in production deployments.
The company’s roots trace back to LinkedIn, where DataHub began as a metadata management tool. It aimed to make data easier to find while ensuring proper usage. Shirshanka Das, co-founder and CTO, led data infrastructure there for nearly a decade. The open source project now has over 15,000 contributors and 3,000 global deployments. Das calls the new feature a “living knowledge base” for agents, helping them avoid incorrect joins by learning from past queries.
Related: Merck Mastercard share agentic AI production insights
Postgres is the most-connected source in DataHub’s global deployments, followed by MySQL and major cloud warehouses like Snowflake. The platform supports over 100 metadata sources. This existing infrastructure now powers the new Context Intelligence layer, which extracts patterns from query logs and translates them into structured definitions for agents to use.
Warehouse query logs are noisy, so DataHub filters for “golden queries” — high-quality analyst work and proven pipelines. These become semantic anchors, helping agents understand which data matches which business questions. Das compares the process to “inverting text to SQL,” where natural language is mapped to structured definitions.
Related: Kraken Dev Co: Elevating Digital Presence Through Expert Web Design and Development
Miro tested analytics agents against its Snowflake environment and found direct access to 10,000 tables caused confusion. The company organized data into well-defined products, limiting what agents could see. A context layer now maps user requests through DataHub’s MCP to Snowflake’s MCP, using metadata and query history to guide SQL generation.
Other vendors like Pinecone and Oracle offer contextual memory features. Microsoft’s Fabric IQ provides a semantic layer, but DataHub positions itself as platform-neutral. Its approach doesn’t replace existing tools but integrates with them, sending context into Snowflake’s semantic views and Microsoft’s Fabric IQ.
Related: Why Hiring the Best SEO Company in Los Angeles Is Key to Digital Success
Kevin Petrie of BARC notes DataHub’s ability to handle both structured and unstructured data — including documents and images — sets it apart. Many competitors focus only on structured tables, missing richer context from text. Michael Ni of Constellation Research highlights the shift from passive cataloging to continuous semantic intelligence as a key differentiator.
Ni argues the competition for context is a new platform war. Whoever controls runtime context controls decision-making for data, agents, and workflows. He warns buyers to clarify their needs: vector memory isn’t the same as business meaning, and governance isn’t execution. DataHub’s approach, he says, addresses these gaps by making context actionable and persistent.
Leave a Reply