Open Standards

Data Landscape

An opinionated, interactive map of the open standards.

The open standards that power a modern data architecture, organised by what they describe. Inspired by the CNCF Landscape and the ThoughtWorks Tech Radar, and Simon's talk on open standards. Click any standard to learn more, or download as PDF.

Single-vendor specs are muted

Definition — how data is described

Contracts
Data Products
Schema
Semantics

Storage — where data lives

File Formats
Open Table Formats
Storage Systems

Movement — how data flows between systems

Database Connectivity
Interconnection
Messaging

Transformation — how data is processed and reshaped

In-Memory Format
Processing

Discovery — how data is found and traced

Catalog APIs
Lineage

Operations — how data is queried, observed, governed

Query
Data Quality
Observability
Policies
AI Interfaces

Click any row to open the standard. Click a column header to sort.

FAQ

What do you mean by open standards?

An open standard, as used on this page, is a specification that anyone can read, implement, and build on — without paying a vendor for the privilege. Concretely, a spec qualifies if:

  • the specification text is published under an open license (Apache, CC-BY, MIT, or a recognised standards-body licence);
  • governance is preferably independently controlled — a foundation, working group, or community — and not in the hands of a single vendor;
  • there are multiple independent implementations, or a credible path to them — one repo controlled by one company is not enough;
  • it is the de-facto standard for its slot in a modern data architecture, not a niche curiosity.

Origin doesn't matter — many of the entries here started as vendor specs (Iceberg at Netflix, Delta Lake at Databricks, gRPC at Google, OpenLineage at Datakin). What matters is whether the spec is openly governed and openly implementable today. The Status field in each entry's drawer makes the governance situation explicit (foundation-hosted, vendor-led, draft, etc.), so you can judge for yourself.

Why did we build this — and what's the origin story?

Because Entropy Data loves open standards — and is building its product on top of them. ODCS, ODPS, OpenLineage, MCP, and the rest are the spine of our marketplace; the same set of specs you can use without us. We also use this landscape ourselves to communicate with stakeholders on PoCs — to explain why a contract-first, vendor-neutral foundation is the cheaper long-term bet than yet another proprietary catalog. See www.entropy-data.com for the full story.

It started as a single slide. Simon was preparing a talk on open standards for data mesh for the Data Mesh Belgium meetup in Leuven (April 2026), and wanted one picture that answered "which standards actually matter, and where do they fit?" Every existing diagram either flattened everything into one box or focused on a single vendor's stack.

The slide kept growing. After the talk, enough people asked for "the picture" that turning it into an interactive, linkable page made more sense than mailing around a PNG. Inspired by the CNCF Landscape for the categorisation and the ThoughtWorks Tech Radar for the per-entry judgement, but narrower in scope: open standards only, no vendors. It's still a living view — suggestions and corrections welcome.

The launch post on LinkedIn went unexpectedly viral — most of the standards added since the launch came in as comments, DMs, and pull requests in the days that followed. The contributor list at the bottom of this page is the visible tip of that.

Why do you call this a data landscape?

Fair pushback: most of what's here is metadata, not data. Schemas, contracts, lineage events, and catalog APIs all describe data rather than are data — and the page won't help you pick a vendor. Guilty as charged on both counts.

We still call it the "Data Landscape" because that's the conversation people are having. When teams say "our data stack" they mean the standards, formats, and protocols around the data, not the bytes themselves.

It's also deliberately not a vendor landscape. There's no Snowflake vs Databricks, no "best catalog of 2026". The CNCF Landscape catalogues vendors and projects; this one catalogues the open standards they should interoperate around. If you're picking a vendor, ask which of these standards they implement. That's the question this landscape helps you ask, not answer.

Why did you include vendor specs in an overview of open standards?

Most "open standards" started as vendor specs. Iceberg came out of Netflix, Delta Lake out of Databricks, gRPC and Protobuf out of Google, OpenLineage was spun out of Datakin (now Astronomer) and incubated at LF AI & Data. What matters is whether the spec is openly governed and openly implementable today, not who wrote the first commit. See What do you mean by open standards? for the criteria we apply.

What do Adopt, Situational, Assess, and Caution mean?

The coloured header on each tile is our editorial judgement — what we'd actually do with this standard if we were starting a new project today. The four levels borrow the verb-style framing of the ThoughtWorks Tech Radar (Adopt / Trial / Assess / Hold), tuned for open standards rather than internal tech adoption:

  • Adopt — the standard you should reach for in new work. Proven, multi-vendor, clearly the default for its slot (e.g. SQL, JSON, HTTP, ODCS for data contracts, Iceberg for table format, OpenLineage for lineage).
  • Situational — the right answer in some contexts but not others. Pick deliberately based on the constraint (gRPC for service-to-service binary RPC, GraphQL for client-driven aggregation, GQL when you're already on graph databases).
  • Assess — promising but not yet proven for production-default use. Track it, prototype with it, but don't commit your architecture to it yet (e.g. OSI, Substrait, OORS).
  • Caution — we'd avoid for new work. Either superseded by a better option or fading from active use (e.g. MDX, JMS, XSLT). Listed because they're still encountered in existing systems.

Click a label in the toolbar legend to hide every tile of that judgement; click again to bring them back. Every standard's drawer carries the per-entry rationale (the Judgement reason line). The same field is in standards.json as judgement + judgementReason if you want to disagree at scale. Within each category panel, tiles are ordered by judgement: Adopt first, Caution last.

Why is X listed, not listed, or marked as a vendor spec?

Why is X listed? Because it meets the four criteria above — openly licensed, independently governed (or noted as a vendor spec), multiple implementations, and de-facto for its slot. Each entry's drawer shows the Governance and Status we relied on; the same fields are in standards.json if you want to audit the whole set at once.

Why isn't X listed? Most likely we haven't gotten to it yet, or we judged it a vendor product rather than an open spec. The bar is the spec, not the popularity of any one implementation. If you think we're wrong, open an issue — the data is a single JSON file, PRs welcome.

Why is X greyed out / marked as a vendor spec? Vendor-led specs are openly published and de-facto, but governance is effectively controlled by one company — they meet every criterion except independent governance. We still list them because they matter (e.g. dbt, Protobuf, Schema Registry); the muted tile and grayscale logo are the caveat, not a downgrade.

Where did legacy and niche go? Those used to be separate tags; they're now folded into the judgement. Standards we'd avoid for new work (XMLA, JMS, MDX) sit under Caution. Standards healthy only in a particular corner (ShEx, LinkML, GQL) sit under Situational or Assess depending on maturity. See the judgement explanation for the per-tier criteria.

Thank you

This landscape is curated by Entropy Data, with grateful thanks to everyone who helped shape it through suggestions, discussions, or pull requests (listed alphabetically by first name): Benjamin Ditel, Denis Arnaud, Erik Wilde, Jon Axon, Juan Sequeda, Marcel Grauwen, Mark M, Peter Hutzli, Prashanth Rao, Stefan Negele, and Thierry Jean.

Cite this landscape

If you reference this landscape in a talk, paper, or blog post, the canonical link is https://www.data-landscape.com/. A BibTeX entry is also available as data-landscape.bib.

Plain text (APA-style)
Harrer, S. (2026). Data Landscape: Open Standards for Modern Data Architecture. Entropy Data. https://www.data-landscape.com/
BibTeX
@misc{harrer2026datalandscape,
  author       = {Harrer, Simon},
  title        = {Data Landscape: Open Standards for Modern Data Architecture},
  year         = {2026},
  month        = apr,
  publisher    = {Entropy Data},
  howpublished = {\url{https://www.data-landscape.com/}}
}

Download: data-landscape.bib

Missed a standard? Spotted something wrong?

This landscape is a living view — suggestions and corrections welcome.