Skip to content

Disaster Recovery

Last Updated: Purpose: High-level recovery planning and scenario overview.


Recovery Principles

  1. Stay calm — panic leads to mistakes
  2. Assess before acting — understand what's actually broken before touching anything
  3. Verify backups exist before starting any restore
  4. One thing at a time — don't change multiple things simultaneously
  5. Take notes — document what you do as you go, use the log template below

Accessing This Documentation During an Outage

Situation Documentation URL
Internet available, home network down
Home network up, internet down
Both down Printed quick reference

Scenario Overview

Scenario Impact Recovery Doc
Power cut — normal recovery Services may need restarting Partner Guide → Power Cut
Primary NAS failure Main services down, data at risk Primary NAS Recovery
Backup NAS failure Backup storage and local docs down Backup NAS Recovery
Proxmox NUC failure VMs/containers down Proxmox Recovery
Single container/service down One service unavailable Restart via Unraid Docker UI
Both NAS devices fail Major data loss risk See Catastrophic Loss below

Service Recovery Priority

When recovering multiple services simultaneously, restore in this order:

  1. Network / DNS — everything else depends on this
  2. Remote access — so Dan can help remotely if needed
  3. Home Assistant — safety and daily routines
  4. Monitoring — visibility into what's healthy
  5. Everything else

Catastrophic Loss (Multiple Devices)

This will take days. Accept it. Don't rush.

Phase 1 — Get Hardware Running (Day 1)

  1. Obtain replacement hardware if needed
  2. Restore Proxmox NUC → Proxmox Recovery
  3. Restore primary NAS → Primary NAS Recovery
  4. Restore backup NAS → Backup NAS Recovery

Phase 2 — Restore Critical Services (Day 1–2)

  1. DNS / network filtering
  2. Remote access (Tailscale / VPN)
  3. Monitoring

Phase 3 — Restore Data (Day 2+)

  1. Pull from offsite backups (may take time for large datasets)
  2. Priority: photos > documents > media
  3. Let it run in the background

Phase 4 — Applications (Day 3+)

  1. Reinstall containers and restore configs
  2. Test each service
  3. Update documentation with lessons learned

Pre-Disaster Checklist

Do these periodically before you need them:

Monthly

  • [ ] Verify backups are running and completing
  • [ ] Verify offsite sync is completing
  • [ ] Check storage space on all devices

Quarterly

  • [ ] Test restoring a VM from backup
  • [ ] Test restoring a file from offsite
  • [ ] Verify documentation is current and accurate

Annually

  • [ ] Full disaster recovery simulation
  • [ ] Update all recovery procedures

Recovery Log Template

Use this when performing actual recovery — fill it in as you go:

Date: ___________
Scenario: ___________
Cause: ___________

Timeline:
- Problem discovered: ___________
- Recovery started: ___________
- Services restored: ___________
- Full recovery: ___________

What worked:
-

What didn't work:
-

Lessons learned:
-

Documentation to update:
-

Recovery Complete Checklist

Recovery is not done until: - [ ] All critical services operational - [ ] Data integrity verified - [ ] Backups resuming automatically - [ ] Monitoring operational - [ ] LESSONS-LEARNED.md updated - [ ] Recovery log added to documentation