Extending table descriptions

Experimental

Most governance in Conformare is captured in the code that builds the data. But upstream sources are often produced by a different team that does not use this package – and the important context (what a table means, who owns it, what to be careful of) lives only in people’s heads or a wiki.

If your organisation agrees a small standard for expressing purpose, owner, business context and risks, Data Engineers can put a structured description straight into the table’s own system comment (e.g. a Spark table comment). Conformare then reads and parses that comment, so a developer who sources the table is made aware of its risks and design notes – without that team adopting the library at all.

The format is yours

Conformare ships a simple default format – a free-text description followed by an @conformare block of key: value lines – but it is pluggable: register any parser to match the standard your business adopts.

Canonical channel reference table.

@conformare
purpose: Map channel_id to a channel name
owner: Data Engineering
context: Channel codes were remapped in 2026; pre-2026 joins are wrong
risk: definition.channel_remap | high | codes remapped in 2026 | owner=Data Engineering
risk: privacy.partner_confidential | medium | partner is confidential | column=channel

A risk: line is id | severity | note | key=value …, where the key/value pairs can set owner, mitigation, or column (to scope a risk to one column). Column comments are read too, and their risks are tagged with that column automatically.

To use a different convention, supply your own parser – any callable that turns the comment string into a dict with purpose / owner / contexts / risks:

def my_parser(comment: str) -> dict:
    ...  # parse however your standard works
    return {"owner": ..., "risks": [{"id": ..., "severity": ...}]}

cf.set_comment_parser(my_parser)

Read it on a Spark table

doc = cf.read_source_governance("lake.sales_channel", spark=spark)
# {'purpose': ..., 'owner': 'Data Engineering', 'contexts': [...], 'risks': [...]}

read_source_governance reads the table comment (and column comments) from the Spark catalog and parses them. For non-Spark systems, fetch the comment yourself and pass comment=....

Checking a source’s risk profile

Ad hoc, before relying on a table:

cf.check_source_comments(["lake.sales_channel"], spark=spark)
# -> CommentGovernanceWarning if the comment declares risk; returns purpose/owner/risks per table

Surfacing it while you track a pipeline

The cleanest way to have comment-declared risk show up automatically is to ingest it into the fleet once (a periodic governance sync over your catalog), where it becomes a static source risk:

cf.record_source_risk_from_comment("lake.sales_channel", spark=spark)

From then on it flows through the normal machinery: with cf.configure_store(..., warn_on_source=True) a pipeline that reads that table is warned on load, cf.check_upstream_risks([...]) lists it, and it appears in the fleet dashboard’s inherited-risk section – so the developer sees the risks, mitigations and design notes the producing team wrote, right where they consume the data.

Experimental. The default format and the reader API may change. Pin a parser you control if you depend on the exact shape.


This site uses Just the Docs, a documentation theme for Jekyll.