Changelog
All notable changes to spanforge-secrets are documented here.
The format follows Keep a Changelog. This project uses Semantic Versioning.
1.0.0 — 2026-04-14
Added
Scanning engine (scanner.py)
scan_payload()— recursively scans nested dict/list structures (JSON payloads) for PII and API key leakage. Walks up to configurablemax_depth(default 10).scan_text()— scans a raw string for PII and API keys. Useful for prompt files, JSONL lines, and arbitrary training data.PIIScanHitdataclass — frozen, with fieldsentity_type,path,match_count,sensitivity,category.PIIScanResultdataclass — frozen, with fieldshits,scanned,source; propertiescleanandviolation_count; methodto_dict().extra_patternsparameter on both scan functions for caller-supplied regex patterns.extra_sensitivityparameter to override sensitivity levels for custom patterns._is_valid_ssn()— SSA range validation (area, group, serial) to eliminate false-positive SSNs. Imported fromspanforge.redact._is_valid_date()— Calendar validation fordate_of_birthdetections to eliminate impossible dates. Imported fromspanforge.redact.
Pattern library (_patterns.py)
- 10 PII patterns imported from
spanforge.redact(email, phone, ssn, credit_card, ip_address, uk_national_insurance, date_of_birth, address, aadhaar, pan) — no duplication. date_of_birthpattern — five formats: ISO 8601 (YYYY-MM-DD), US (MM/DD/YYYY), Indian/European (DD/MM/YYYY),DD Mon YYYY, andMonth DD, YYYY. Provided byspanforge>=2.0.2.addresspattern — house number + word tokens + road suffix. Provided byspanforge>=2.0.2.- 5 API key patterns:
openai_api_key,anthropic_api_key,aws_access_key_id,aws_secret_access_key,gcp_service_account_key. _SENSITIVITY_MAP— maps all 15 entity types to their sensitivity level.
Luhn / Verhoeff (_luhn.py, _verhoeff.py)
- Re-export shims that delegate to
spanforge.redact._luhn_checkandspanforge.redact._verhoeff_checkwithout duplicating any algorithm code.
Audit chain verification (chain.py)
verify_chain_file()— reads a JSONL audit log and delegates tospanforge.signing.verify_chain().- UTF-8 decoding, 50 MB size guard, blank-line skipping, per-line error reporting.
- Returns a normalised dict:
valid,first_tampered,gaps,tampered_count,tombstone_count.
CLI (cli.py)
scansub-command: scans files, directories (recursive), or stdin.verify-chainsub-command: verifies HMAC audit chains.--secretflag for HMAC secret withSPANFORGE_HMAC_SECRETenvironment variable fallback.--format json(default) and--format sarifoutput modes.- SARIF 2.1.0 output compatible with GitHub Advanced Security / Code Scanning.
--diffmode: scans onlygit diff --stagedadded lines for use as a pre-commit hook.--ignore-fileflag and auto-detection of.spanforge-secretsignore.- Binary file detection (skips ~30 known binary extensions automatically).
- 50 MB file size guard.
- Non-UTF-8 file handling (skip with warning, not error).
- Exit codes: 0 (clean), 1 (violations or chain invalid), 2 (usage error), 3 (I/O/format error).
Packaging
spanforge>=2.0.2declared as a required runtime dependency; this is a reference implementation of the spanforge framework.py.typedmarker for PEP 561 type information.python -m spanforge_secretsentry point via__main__.py.
Tests
- 111 tests covering all entity types, validators, CLI commands, SARIF output, ignore files, diff mode, chain verification, and Indian date formats.
TestVerifyChainFileIntegration— 4 integration tests using realspanforge.signingandspanforge.eventAPIs, gated withpytest.importorskip.
Pre-release development
De-duplication
Removed all duplicated implementations of Luhn, Verhoeff, and the 8 base PII patterns — these are now imported directly from spanforge.redact. As of spanforge>=2.0.2, all 10 PII patterns (including date_of_birth and address) and validators (_is_valid_ssn, _is_valid_date) come from upstream — no local duplication.
Packaging alignment
Corrected pyproject.toml to declare spanforge as a required (dependencies) rather than optional dependency, consistent with the reference implementation intent.