Disaster Recovery

Disaster Recovery Statement

OBIEN の災害復旧方針・バックアップ戦略・障害シナリオ別復旧手順を記載しています。

Recovery Objectives

RTO (Recovery Time Objective)

4 hours

Primary systems restored and operational

RPO (Recovery Point Objective)

24 hours

Maximum acceptable data loss window

Backup Strategy

AssetMethodLocationFrequency
Database (.json)File-based .bak rotationLocal + off-siteDaily
Source codeGitGitHub (private repository)Every commit
Secrets / Env varsVercel Environment VariablesVercel platform (encrypted)On change
Client deliverablesSeparate clients/ directoryGit-tracked per projectPer delivery

Database backup files (.bak) are excluded from the Git repository via .gitignore. Backup rotation retains the last 7 daily snapshots.

Failure Scenarios & Recovery

1. Database Corruption

Detection: Application errors on data read; Sentry alert

  1. Identify the last known good .bak file
  2. Stop writes (set maintenance mode if applicable)
  3. Restore from .bak file
  4. Verify data integrity
  5. Resume normal operation

Expected RTO: < 1 hour

2. Vercel Platform Outage

Detection: Uptime monitor alert; user reports

  1. Check Vercel Status for incident details
  2. If prolonged (> 2 hours): redirect DNS via Cloudflare to alternative host
  3. Deploy from GitHub to alternative platform (e.g., Cloudflare Pages)

Expected RTO: 2–4 hours

3. Upstash Redis Outage (Rate Limiting)

Detection: Rate limit errors in logs; Sentry alert

  1. Automatic: In-memory fallback is built into rate-limit.ts — the system degrades gracefully
  2. Manual intervention not required unless outage exceeds 24 hours

Expected RTO: 0 (automatic fallback)

4. Source Code Loss

Detection: Repository access failure

  1. Clone from GitHub remote
  2. Restore environment variables from Vercel dashboard
  3. Redeploy

Expected RTO: < 30 minutes

5. Secret / Credential Compromise

Detection: Unauthorized access alerts; anomalous API usage

  1. Immediately rotate all compromised credentials via Vercel dashboard
  2. Increment SESSION_VERSION to invalidate all active admin sessions
  3. Review access logs for unauthorized actions
  4. Deploy updated configuration
  5. Notify affected parties within 24 hours

Expected RTO: < 2 hours

Communication Protocol

PhaseActionChannel
DetectionInternal alert triggeredSentry / Uptime monitor
TriageAssess severity and impactInternal
NotificationInform affected clientsEmail
ResolutionExecute recovery procedurePer scenario above
PostmortemPublish incident report within 24hEmail to affected parties

Roles & Review

Responsibilities

  • Representative (Yoshioka Takuo) — Incident commander; final escalation point
  • AI Operations — Automated monitoring, alerting, and initial triage

Review Schedule

  • Last reviewed: 2026-04-26
  • Next review: Monthly (or after any incident)
  • Review scope: Recovery objectives, backup verification, procedure walkthrough