Disaster Recovery¶
Last Updated: Purpose: High-level recovery planning and scenario overview.
Recovery Principles¶
- Stay calm — panic leads to mistakes
- Assess before acting — understand what's actually broken before touching anything
- Verify backups exist before starting any restore
- One thing at a time — don't change multiple things simultaneously
- Take notes — document what you do as you go, use the log template below
Accessing This Documentation During an Outage¶
| Situation | Documentation URL |
|---|---|
| Internet available, home network down | |
| Home network up, internet down | |
| Both down | Printed quick reference |
Scenario Overview¶
| Scenario | Impact | Recovery Doc |
|---|---|---|
| Power cut — normal recovery | Services may need restarting | Partner Guide → Power Cut |
| Primary NAS failure | Main services down, data at risk | Primary NAS Recovery |
| Backup NAS failure | Backup storage and local docs down | Backup NAS Recovery |
| Proxmox NUC failure | VMs/containers down | Proxmox Recovery |
| Single container/service down | One service unavailable | Restart via Unraid Docker UI |
| Both NAS devices fail | Major data loss risk | See Catastrophic Loss below |
Service Recovery Priority¶
When recovering multiple services simultaneously, restore in this order:
- Network / DNS — everything else depends on this
- Remote access — so Dan can help remotely if needed
- Home Assistant — safety and daily routines
- Monitoring — visibility into what's healthy
- Everything else
Catastrophic Loss (Multiple Devices)¶
This will take days. Accept it. Don't rush.
Phase 1 — Get Hardware Running (Day 1)¶
- Obtain replacement hardware if needed
- Restore Proxmox NUC → Proxmox Recovery
- Restore primary NAS → Primary NAS Recovery
- Restore backup NAS → Backup NAS Recovery
Phase 2 — Restore Critical Services (Day 1–2)¶
- DNS / network filtering
- Remote access (Tailscale / VPN)
- Monitoring
Phase 3 — Restore Data (Day 2+)¶
- Pull from offsite backups (may take time for large datasets)
- Priority: photos > documents > media
- Let it run in the background
Phase 4 — Applications (Day 3+)¶
- Reinstall containers and restore configs
- Test each service
- Update documentation with lessons learned
Pre-Disaster Checklist¶
Do these periodically before you need them:
Monthly¶
- [ ] Verify backups are running and completing
- [ ] Verify offsite sync is completing
- [ ] Check storage space on all devices
Quarterly¶
- [ ] Test restoring a VM from backup
- [ ] Test restoring a file from offsite
- [ ] Verify documentation is current and accurate
Annually¶
- [ ] Full disaster recovery simulation
- [ ] Update all recovery procedures
Recovery Log Template¶
Use this when performing actual recovery — fill it in as you go:
Date: ___________
Scenario: ___________
Cause: ___________
Timeline:
- Problem discovered: ___________
- Recovery started: ___________
- Services restored: ___________
- Full recovery: ___________
What worked:
-
What didn't work:
-
Lessons learned:
-
Documentation to update:
-
Recovery Complete Checklist¶
Recovery is not done until: - [ ] All critical services operational - [ ] Data integrity verified - [ ] Backups resuming automatically - [ ] Monitoring operational - [ ] LESSONS-LEARNED.md updated - [ ] Recovery log added to documentation