← Home 8 signal types · regex · process-type gated

How Compliance Signals Work

We scan NEPA documents for patterns that indicate litigation risk. Every flag shows the exact text that triggered it and where it appears—no black box.

What they are

Compliance signals are named risk flags aligned to real NEPA litigation. Courts often overturn projects for the same kinds of defects: deferred mitigation, missing environmental justice analysis, weak no-action alternative discussion, approval contingent on studies not yet done. We run eight detectors over the project’s main document(s); each hit is stored with a verbatim excerpt and character offset so you can see exactly what triggered it.

The eight flag types

Flag	Severity	What it catches
deferred_mitigation	High	Mitigation pushed to “final design,” “future phases,” or “subsequent documentation.”
future_studies_reliance	High	Approval contingent on studies not yet completed.
ej_absent	High	Explicit statement that environmental justice is not addressed (EA/EIS only).
ej_thin_coverage	Medium	EJ mentioned only in passing or deferred to future NEPA (EA/EIS only).
no_action_absent	High	No-action alternative missing or not included (EA/EIS only).
no_action_thin	Medium	No-action alternative summarily dismissed or not adequately compared (EA/EIS only).
cumulative_impacts_thin	Medium	Cumulative impacts not meaningfully analyzed or deferred (EA/EIS only).
tribal_interests	Info	Tribal consultation or tribal interests mentioned—review for completeness.

Process-type gating (CE vs EA/EIS)

Categorical Exclusions (CE) are approved under a different standard than EAs and EISs. CE templates typically don’t include environmental justice sections, no-action alternatives, or cumulative impacts analysis—by design. If we ran those detectors on CE text, we’d get meaningless noise.

So we gate five flag types to EA and EIS only: ej_absent, ej_thin_coverage, no_action_absent, no_action_thin, cumulative_impacts_thin. On CE documents we only run deferred_mitigation, future_studies_reliance, and tribal_interests. This keeps results relevant and cuts false positives.

Why regex (not an LLM)?

We use deterministic regex patterns for three reasons:

Explainability: Every flag includes the exact sentence that triggered it and its position in the document. An LLM can’t point to a specific character offset.
Testability: We have 18 pytest tests (should-fire and should-not-fire per flag). We can prove the detectors behave correctly and catch regressions. You can’t unit-test an LLM the same way.
Determinism: The same document scanned twice always produces the same flags. That’s required for attestation—the hash we write to Solana must be reproducible.

The tradeoff is recall: we might miss creatively worded risk. We optimize for precision so lenders can trust that a flag is a real signal, not a false alarm.

Exclusions (reducing false positives)

Some phrases look like risk but aren’t. We explicitly exclude them:

“prior to construction” — We do not flag this for deferred mitigation. It usually appears in committed mitigation language (“BMPs will be implemented prior to construction”), which is the opposite of deferred.
“long-term monitoring plan will be developed” — We do not flag this as future-studies reliance. Long-term monitoring plans are often required by regulation; their development is expected, not a defect.

Matches that fall inside these excluded phrases are dropped before creating a flag.

Deduplication and scan flow

We deduplicate by (flag_type, normalized excerpt) so the same phrase doesn’t create multiple identical flags. Scan runs only on the project’s main documents (for EIS we prefer FEIS/DEIS as the analysis document). Results are stored in the flags table with project_id, document_id, excerpt, and char_offset so the UI (and attestation) can reference them precisely.

Test suite

We maintain tests/test_signals.py with synthetic should-fire and should-not-fire cases for each flag. For example: a sentence like “mitigation measures will be developed prior to final design” should fire deferred_mitigation; “BMPs will be implemented prior to construction” should not. The suite has 18 tests and runs on every change so we don’t regress.

Key files to look at

backend/intelligence/signals.py — All detectors, scan_document, scan_project
backend/config.py — EA_EIS_ONLY_FLAGS
tests/test_signals.py — Should-fire / should-not-fire tests per flag

In one sentence

Eight regex-based detectors look for litigation-relevant patterns in NEPA text; process-type gating and explicit exclusions keep false positives low; every flag stores the exact excerpt and offset so the result is auditable and attestable.