Disaster Recovery
Disaster Recovery Statement
OBIEN の災害復旧方針・バックアップ戦略・障害シナリオ別復旧手順を記載しています。
Recovery Objectives
RTO (Recovery Time Objective)
4 hours
Primary systems restored and operational
RPO (Recovery Point Objective)
24 hours
Maximum acceptable data loss window
Backup Strategy
| Asset | Method | Location | Frequency |
|---|---|---|---|
| Database (.json) | File-based .bak rotation | Local + off-site | Daily |
| Source code | Git | GitHub (private repository) | Every commit |
| Secrets / Env vars | Vercel Environment Variables | Vercel platform (encrypted) | On change |
| Client deliverables | Separate clients/ directory | Git-tracked per project | Per delivery |
Database backup files (.bak) are excluded from the Git repository via .gitignore. Backup rotation retains the last 7 daily snapshots.
Failure Scenarios & Recovery
1. Database Corruption
Detection: Application errors on data read; Sentry alert
- Identify the last known good .bak file
- Stop writes (set maintenance mode if applicable)
- Restore from .bak file
- Verify data integrity
- Resume normal operation
Expected RTO: < 1 hour
2. Vercel Platform Outage
Detection: Uptime monitor alert; user reports
- Check Vercel Status for incident details
- If prolonged (> 2 hours): redirect DNS via Cloudflare to alternative host
- Deploy from GitHub to alternative platform (e.g., Cloudflare Pages)
Expected RTO: 2–4 hours
3. Upstash Redis Outage (Rate Limiting)
Detection: Rate limit errors in logs; Sentry alert
- Automatic: In-memory fallback is built into rate-limit.ts — the system degrades gracefully
- Manual intervention not required unless outage exceeds 24 hours
Expected RTO: 0 (automatic fallback)
4. Source Code Loss
Detection: Repository access failure
- Clone from GitHub remote
- Restore environment variables from Vercel dashboard
- Redeploy
Expected RTO: < 30 minutes
5. Secret / Credential Compromise
Detection: Unauthorized access alerts; anomalous API usage
- Immediately rotate all compromised credentials via Vercel dashboard
- Increment SESSION_VERSION to invalidate all active admin sessions
- Review access logs for unauthorized actions
- Deploy updated configuration
- Notify affected parties within 24 hours
Expected RTO: < 2 hours
Communication Protocol
| Phase | Action | Channel |
|---|---|---|
| Detection | Internal alert triggered | Sentry / Uptime monitor |
| Triage | Assess severity and impact | Internal |
| Notification | Inform affected clients | |
| Resolution | Execute recovery procedure | Per scenario above |
| Postmortem | Publish incident report within 24h | Email to affected parties |
Roles & Review
Responsibilities
- Representative (Yoshioka Takuo) — Incident commander; final escalation point
- AI Operations — Automated monitoring, alerting, and initial triage
Review Schedule
- Last reviewed: 2026-04-26
- Next review: Monthly (or after any incident)
- Review scope: Recovery objectives, backup verification, procedure walkthrough