Disaster Recovery¶

Last Updated: Purpose: High-level recovery planning and scenario overview.

Recovery Principles¶

Stay calm — panic leads to mistakes
Assess before acting — understand what's actually broken before touching anything
Verify backups exist before starting any restore
One thing at a time — don't change multiple things simultaneously
Take notes — document what you do as you go, use the log template below

Accessing This Documentation During an Outage¶

Situation	Documentation URL
Internet available, home network down
Home network up, internet down
Both down	Printed quick reference

Scenario Overview¶

Scenario	Impact	Recovery Doc
Power cut — normal recovery	Services may need restarting	Partner Guide → Power Cut
Primary NAS failure	Main services down, data at risk	Primary NAS Recovery
Backup NAS failure	Backup storage and local docs down	Backup NAS Recovery
Proxmox NUC failure	VMs/containers down	Proxmox Recovery
Single container/service down	One service unavailable	Restart via Unraid Docker UI
Both NAS devices fail	Major data loss risk	See Catastrophic Loss below

Service Recovery Priority¶

When recovering multiple services simultaneously, restore in this order:

Network / DNS — everything else depends on this
Remote access — so Dan can help remotely if needed
Home Assistant — safety and daily routines
Monitoring — visibility into what's healthy
Everything else

Catastrophic Loss (Multiple Devices)¶

This will take days. Accept it. Don't rush.

Phase 1 — Get Hardware Running (Day 1)¶

Obtain replacement hardware if needed
Restore Proxmox NUC → Proxmox Recovery
Restore primary NAS → Primary NAS Recovery
Restore backup NAS → Backup NAS Recovery

Phase 2 — Restore Critical Services (Day 1–2)¶

DNS / network filtering
Remote access (Tailscale / VPN)
Monitoring

Phase 3 — Restore Data (Day 2+)¶

Pull from offsite backups (may take time for large datasets)
Priority: photos > documents > media
Let it run in the background

Phase 4 — Applications (Day 3+)¶

Reinstall containers and restore configs
Test each service
Update documentation with lessons learned

Pre-Disaster Checklist¶

Do these periodically before you need them:

Monthly¶

[ ] Verify backups are running and completing
[ ] Verify offsite sync is completing
[ ] Check storage space on all devices

Quarterly¶

[ ] Test restoring a VM from backup
[ ] Test restoring a file from offsite
[ ] Verify documentation is current and accurate

Annually¶

[ ] Full disaster recovery simulation
[ ] Update all recovery procedures

Recovery Log Template¶

Use this when performing actual recovery — fill it in as you go:

Date: ___________
Scenario: ___________
Cause: ___________

Timeline:
- Problem discovered: ___________
- Recovery started: ___________
- Services restored: ___________
- Full recovery: ___________

What worked:
-

What didn't work:
-

Lessons learned:
-

Documentation to update:
-

Recovery Complete Checklist¶

Recovery is not done until: - [ ] All critical services operational - [ ] Data integrity verified - [ ] Backups resuming automatically - [ ] Monitoring operational - [ ] LESSONS-LEARNED.md updated - [ ] Recovery log added to documentation