Risks & governance

Most data risk is created in code: a join that keeps an email address, a feature that quietly stands in for a protected attribute, a filter that drops a whole region. Yet the record of those risks usually lives somewhere else entirely: a spreadsheet, a slide deck, or a review document written after the fact, where it drifts out of step with what the pipeline actually does.

Conformare lets you record risks next to the code that creates them, so the governance picture is generated from the implementation and stays true to it.

What “risk” means here

A risk is a specific, named concern about what a step does to data. Every risk has:

  • a stable id (e.g. privacy.pii_exposure),
  • a category (Privacy, Compliance, Security, Bias / Fairness, Data Quality, Operational),
  • a human label and description, and
  • a severity (low < medium < high < critical).

Conformare ships a built-in catalog of common risks, and you can register your own.

Capture, mitigate, own

Three things turn a noticed risk into managed governance:

  1. Capture. Name the risk on the code that gives rise to it, so it is visible and traceable.
  2. Mitigate. Record how the risk is handled (mask a field, broadcast a small table, add a contract test). A mitigation is a claim you can review.
  3. Own. Name a person or team accountable for it. Accountability is what makes a mitigation real: a control with no owner is a control nobody maintains.

Conformare ranks each risk’s governance state from exactly these three:

State Governance concern
Mitigated and owned Low
Mitigated, no owner Medium
Not mitigated High

An unowned or unmitigated high-severity risk is precisely what a reviewer should see first.

Declaring risks

Attach risks to a region of code with risk(...), usually alongside a describe(...):

import conformare as cf

with cf.describe("Clean customers", purpose="Keep UK adults only",
                 risks=cf.risk("privacy.pii_exposure",
                               note="email retained through to export",
                               mitigation="Mask email before any export",
                               owner="data-governance")):
    adults = customers.filter(customers["age"] >= 18)

Risks that apply to the whole pipeline go on describe_process(...); and if you prefer to keep governance out of the hot path, you can declare the same things in a function’s docstring (a Conformare: block) with no code change. Register your own catalog entries with register_risk(...).

What you do with captured risks

  • Align before deployment. Walk the risk register in the interactive HTML report, or export a Formal risk checklist, a sign-off-ready document the governance team reviews, comments on and dates before go-live.
  • Generate audit evidence. Because risks live with the code, the next systems audit gets a current, dated artefact instead of a stale document: who owned which risk, how it was mitigated, and when it was reviewed.
  • Keep accountability visible over time. Ownership and mitigation travel with the pipeline, so gaps surface in review rather than in an incident.

Risks that matter in a business context

A few illustrative examples from the built-in catalog:

  • PII exposure (privacy.pii_exposure, Privacy). Personal data such as email or date of birth is retained beyond need or carried to an export. Business impact: regulatory exposure and breach blast-radius. Typical mitigation: mask or drop the field before export, owned by data governance.
  • GDPR processing (compliance.gdpr, Compliance). Processing falls under GDPR, so a lawful basis and data minimisation apply. Business impact: fines and mandated changes. Typical mitigation: a DPIA on file and a documented lawful basis, owned by the DPO.
  • Proxy variable (fairness.proxy_variable, Bias / Fairness). A feature (say, postcode) stands in for a protected attribute. Business impact: discriminatory outcomes and legal challenge. Typical mitigation: review and justify or remove the feature, owned by ML governance.
  • Expensive action (ops.expensive_action, Operational). A full-data join or collect is costly at scale. Business impact: cost overruns and missed SLAs. Typical mitigation: broadcast the small side or pre-aggregate, owned by the data platform team.
  • Schema drift (quality.schema_drift, Data Quality). An upstream column changes shape without notice. Business impact: silent corruption of downstream numbers. Typical mitigation: a data contract or a Great Expectations checkpoint, owned by the producing team.

Extend the catalog

Add organisation-specific risks once and reuse them everywhere:

cf.register_risk("model.drift", category="Model risk",
                 label="Model drift",
                 description="Live feature distribution diverges from training.",
                 default_severity="high")

with cf.describe("Score", risks=cf.risk("model.drift",
                                        mitigation="Weekly PSI monitor",
                                        owner="ml-platform")):
    ...

See also: Data sensitivity, which flags where protected columns are used and whether they reach a written output.


This site uses Just the Docs, a documentation theme for Jekyll.