OpenMetadata bridge

Experimental

OpenMetadata is a deployed metadata platform / data catalog – the central place an organisation discovers and governs data across its whole estate. Conformare is the opposite kind of tool: an in-process library that captures lineage and risk from the authored code as it runs. The two are complementary, so Conformare bridges to OpenMetadata in both directions.

client = cf.OpenMetadataClient("https://open-metadata.internal:8585", token="<jwt>")

The client is a thin wrapper over the OpenMetadata REST API (stdlib only, no extra dependency). Pass it to the import/export calls below, or pass base_url=/token= to have one built for you.

Import: read governance OpenMetadata already holds

If your tables are already documented in OpenMetadata, Conformare can read that governance so a developer who sources one of those tables sees its risks – it maps an OpenMetadata table entity to the same governance doc as the comment and dbt readers. It pulls from:

  • the table description (parsed with the active comment parser – so an @conformare block works), and the owner;
  • a conformare custom property (extension.conformare) – either a structured object ({purpose, owner, contexts, risks}) or a comment string;
  • tags under a risk classification (default ConformareRisk) – the tag name after the classification becomes the risk id;
  • the same on columns (column risks are scoped to that column).
# read the governance doc
doc = cf.read_openmetadata_governance("warehouse.db.schema.sales_channel", client=client)

# warn ad hoc when checking a source's risk profile
cf.check_openmetadata_risks(["warehouse.db.schema.sales_channel"], client=client)

# or ingest into the fleet as static source risks, so they then surface through
# warn_on_source / check_upstream_risks / the dashboard while you track a pipeline
cf.record_source_risk_from_openmetadata("warehouse.db.schema.sales_channel", client=client)

Export: push a tracked run into OpenMetadata

After tracking a pipeline, push its table-level lineage (each upstream source table to each output table) and a per-output-table risk governance block into OpenMetadata:

cf.trackSpark()
# ... your pipeline ...
cf.export_to_openmetadata(client=client, resolve=to_om_fqn)
  • Lineage – for every output (sink) table, an edge from each upstream input (source) table is written via the lineage API. (Conformare’s intermediate dataframes are not tables, so the exported lineage is table-to-table – the granularity OpenMetadata models.)
  • Governance – the risks that feed each output table (direct, indirect-with-distance, process) are rendered into an @conformare block and written to that table’s conformare custom property, so the catalog carries the risk back to consumers.
  • resolve maps a Conformare location (a path or table name) to an OpenMetadata table FQN; omit it if your locations already are FQNs. Per-entity failures are collected and returned, not raised.

Round trip. Export risk to OpenMetadata, and the import side reads it straight back – the @conformare block written to the custom property is parsed by the same governance reader.

Experimental. The exact endpoints and entity fields can differ across OpenMetadata versions; the HTTP layer is isolated in OpenMetadataClient so it is easy to adapt. Writing a custom property requires that property type to exist in your OpenMetadata instance, and lineage by FQN may need entity ids depending on version. A future OpenLineage exporter would offer a vendor-neutral path to OpenMetadata (and DataHub / Atlan / Marquez).


This site uses Just the Docs, a documentation theme for Jekyll.