How Compliance Signals Work
We scan NEPA documents for patterns that indicate litigation risk. Every flag shows the exact text that triggered it and where it appears—no black box.
What they are
Compliance signals are named risk flags aligned to real NEPA litigation. Courts often overturn projects for the same kinds of defects: deferred mitigation, missing environmental justice analysis, weak no-action alternative discussion, approval contingent on studies not yet done. We run eight detectors over the project’s main document(s); each hit is stored with a verbatim excerpt and character offset so you can see exactly what triggered it.
The eight flag types
| Flag | Severity | What it catches |
|---|---|---|
| deferred_mitigation | High | Mitigation pushed to “final design,” “future phases,” or “subsequent documentation.” |
| future_studies_reliance | High | Approval contingent on studies not yet completed. |
| ej_absent | High | Explicit statement that environmental justice is not addressed (EA/EIS only). |
| ej_thin_coverage | Medium | EJ mentioned only in passing or deferred to future NEPA (EA/EIS only). |
| no_action_absent | High | No-action alternative missing or not included (EA/EIS only). |
| no_action_thin | Medium | No-action alternative summarily dismissed or not adequately compared (EA/EIS only). |
| cumulative_impacts_thin | Medium | Cumulative impacts not meaningfully analyzed or deferred (EA/EIS only). |
| tribal_interests | Info | Tribal consultation or tribal interests mentioned—review for completeness. |
Process-type gating (CE vs EA/EIS)
Categorical Exclusions (CE) are approved under a different standard than EAs and EISs. CE templates typically don’t include environmental justice sections, no-action alternatives, or cumulative impacts analysis—by design. If we ran those detectors on CE text, we’d get meaningless noise.
So we gate five flag types to EA and EIS only: ej_absent, ej_thin_coverage, no_action_absent, no_action_thin, cumulative_impacts_thin. On CE documents we only run deferred_mitigation, future_studies_reliance, and tribal_interests. This keeps results relevant and cuts false positives.
Why regex (not an LLM)?
We use deterministic regex patterns for three reasons:
- Explainability: Every flag includes the exact sentence that triggered it and its position in the document. An LLM can’t point to a specific character offset.
- Testability: We have 18 pytest tests (should-fire and should-not-fire per flag). We can prove the detectors behave correctly and catch regressions. You can’t unit-test an LLM the same way.
- Determinism: The same document scanned twice always produces the same flags. That’s required for attestation—the hash we write to Solana must be reproducible.
The tradeoff is recall: we might miss creatively worded risk. We optimize for precision so lenders can trust that a flag is a real signal, not a false alarm.
Exclusions (reducing false positives)
Some phrases look like risk but aren’t. We explicitly exclude them:
- “prior to construction” — We do not flag this for deferred mitigation. It usually appears in committed mitigation language (“BMPs will be implemented prior to construction”), which is the opposite of deferred.
- “long-term monitoring plan will be developed” — We do not flag this as future-studies reliance. Long-term monitoring plans are often required by regulation; their development is expected, not a defect.
Matches that fall inside these excluded phrases are dropped before creating a flag.
Deduplication and scan flow
We deduplicate by (flag_type, normalized excerpt) so the same phrase doesn’t create multiple identical flags. Scan runs only on the project’s main documents (for EIS we prefer FEIS/DEIS as the analysis document). Results are stored in the flags table with project_id, document_id, excerpt, and char_offset so the UI (and attestation) can reference them precisely.
Test suite
We maintain tests/test_signals.py with synthetic should-fire and should-not-fire cases for each flag. For example: a sentence like “mitigation measures will be developed prior to final design” should fire deferred_mitigation; “BMPs will be implemented prior to construction” should not. The suite has 18 tests and runs on every change so we don’t regress.
Key files to look at
backend/intelligence/signals.py— All detectors, scan_document, scan_projectbackend/config.py— EA_EIS_ONLY_FLAGStests/test_signals.py— Should-fire / should-not-fire tests per flag
In one sentence
Eight regex-based detectors look for litigation-relevant patterns in NEPA text; process-type gating and explicit exclusions keep false positives low; every flag stores the exact excerpt and offset so the result is auditable and attestable.