Skip to content

myMBR-OS — Utility Reliability Standards

myMBR-OS — Utility Reliability Standards

Section titled “myMBR-OS — Utility Reliability Standards”

Every utility built as part of myMBR-OS must conform to these standards. Derived from the my_backup system — a proven defense-in-depth architecture.

Reference: D:\FSS\Software\Utils\PythonUtils\my_backup\README.md


No utility can be trusted to report its own success. Every critical process must have an independent verifier that runs at a different time and can detect silent failures.

Pattern from my_backup:

  • Main process runs at 2 AM (Windows Task Scheduler)
  • Independent verifier runs at 8 AM (WSL cron) — checks snapshot age, not just exit code
  • If the 2 AM run silently fails, the 8 AM check detects it and fires an alert

Applied to myMBR-OS:

  • Rate scraper runs at 06:00 ET (cron)
  • Health checker runs at 09:00 ET (independent cron) — verifies data freshness, record counts, anomaly thresholds
  • If scraper silently fails, health checker detects stale data and fires alert

  • All output written to logs/[utility_name].log
  • Log rotation: keep last 5 runs
  • Format: [YYYY-MM-DD HH:MM:SS] LEVEL : message
  • Levels: INFO, WARNING, ERROR, CRITICAL
  • Every run starts with === RUN STARTED === and ends with RESULT: SUCCESS or RESULT: FAILED
  • All critical failures trigger immediate alerts via notify_manager (email + Telegram)
  • Alert subjects: tiered by severity
    • Calm (INFO/success): [myMBR-OS] [utility]: OK — no issues
    • Loud (WARNING): [myMBR-OS] [utility]: WARNING — review required
    • Critical (ERROR/CRITICAL): [myMBR-OS] [utility]: CRITICAL — immediate action needed
  • Notification failures are non-blocking — never abort the main process
  • Weekly email summarizing the last 7 days of operation for all utilities
  • Calm subject on clean week; loud subject if any warnings or errors occurred
  • Delivered Sunday morning (mirror of my_backup’s pattern)
  • Every utility with scheduled runs must have a separate verification job
  • Verifier runs at a different scheduled time than the main process
  • Verifier checks outcomes (data freshness, record counts, expected values) — not just exit codes
  • Verifier fires its own alert if the main process appears to have failed silently
  • Every utility supports --dry-run flag
  • Dry run logs what would happen without modifying any state
  • Required before any production deployment or config change
  • Every utility has a test suite in tests/
  • Tests cover: config validation, connectivity, data integrity, end-to-end pipeline
  • Includes at least one “restore/recovery” test that verifies data can actually be used downstream
  • Run: uv run pytest tests/ before any deployment
  • All paths, thresholds, credentials, and feature flags defined in config.yaml
  • Secrets in .env (never committed)
  • .env.example uses placeholder values only
  • No hardcoded values in Python code
  • Failures are logged and alerted, never silently swallowed
  • Pipeline continues what it can; accumulates failures
  • Final result: SUCCESS (all tasks passed), WARNING (some non-critical failures), or FAILED (any critical failure)
  • A FAILED result always triggers an alert

Define in config.yaml for each utility:

  • What changes are expected and can proceed automatically
  • What changes are anomalous and require human approval before proceeding
  • Example (rate scanner): >15% of rates changed since previous run → pause and alert
  • Once per year: manually test the full recovery path for each utility
  • For rate scanner: delete the SQLite database, verify it can be reconstructed from source
  • For any downstream consumer: verify it can operate correctly on reconstructed data
  • Annual reminder email delivered Jan 1 (mirror of my_backup’s send_manual_reminders)

All myMBR-OS utilities use the existing notify_manager system for alerts.

Reference: D:\FSS\Software\Utils\PythonUtils\notify_manager\

Usage pattern (mirror of my_backup):

from notifications import notify_manager # or equivalent import
# Critical failure
notify_manager.send_alert(
subject="[myMBR-OS] rate-scanner: CRITICAL — scraper failed",
body=error_details,
level="critical"
)

FrequencyTimeTask
Daily06:00 ETMain process (e.g., rate scraper)
Daily09:00 ETIndependent health check
WeeklySunday 09:00Status report email
Monthly1st of monthFull maintenance run
AnnuallyJan 1Manual reminders + fire drill checklist

Each utility adapts this template to its own needs.


  • Structured logging implemented (rotation, format, levels)
  • Alerts on failure wired to notify_manager
  • Independent verifier job created (different cron time)
  • Weekly status report covers this utility
  • Dry-run mode implemented and tested
  • Test suite written and passing (including end-to-end)
  • config.yaml / .env.example created (no hardcoded values)
  • Human escalation thresholds defined
  • Annual fire drill added to reminder list
  • README.md written (mirrors my_backup style)