Skip to content

EquiTrail — Disaster Recovery Plan

Owner: Nossie Consultancy B.V. · Last updated: 2026-06-01 Review this document every quarter and after every significant infrastructure change.


1. Scope & Objectives

Systems covered

System Criticality Notes
Firebase (Auth + Firestore + Functions) CRITICAL User accounts, rides, horses, settings
Oracle Cloud routing server (GraphHopper) HIGH Navigation — app works without it (graceful degradation)
Website (equitrail.horse, Plesk) MEDIUM Marketing/PRO page — app unaffected
Discord bot (Oracle server) LOW Community — app unaffected
Android keystore (equitrail.jks) CRITICAL Loss = permanent inability to update Play Store app
Play Console service account HIGH Loss = no automated uploads (manual still works)
Git repository (Azure DevOps) HIGH Source of truth for all code
Oracle Cloud backup server MEDIUM Secondary store for secrets + docs

Recovery Objectives

System RTO (Recovery Time) RPO (Recovery Point)
Firebase Firestore 4 hours 24 hours (daily export)
Routing server 2 hours No data loss (stateless)
Android keystore 1 hour No data loss (backed up to Oracle + 2 offline copies)
Website 30 minutes No data loss (static files in git + Oracle backup)
Discord bot 4 hours No data loss (code in git, routes on server)
Git repository 2 hours No data loss (Azure DevOps + local clone)

2. Critical Asset Inventory

2a. Irreplaceable Assets (MUST protect — loss = catastrophic)

Asset Location Backup
android/equitrail.jks workdir + Oracle backup ALSO keep offline encrypted copy
android/key.properties workdir + Oracle backup Contains keystore passwords
android/play_service_account.json workdir + Oracle backup Regeneratable via Google Cloud
android/app/google-services.json workdir + Oracle backup Re-downloadable from Firebase console
ios/Runner/GoogleService-Info.plist workdir + Oracle backup Re-downloadable from Firebase console
credentials.env workdir + Oracle backup Master credentials file
Firebase user data (Firestore) Firebase cloud Daily export to Cloud Storage

WARNING: Losing android/equitrail.jks without a backup means you CANNOT update the Play Store app ever again. You would need to publish a brand-new app with a new package name, losing all install history and reviews. This file must have 3 copies: (1) workdir, (2) Oracle backup server, (3) offline encrypted (USB/1Password).

2b. Recoverable Assets

Asset Recovery Method Time
Source code Clone from Azure DevOps 5 min
Website HTML Pull from git 5 min
Discord bot code Pull from git 5 min
GraphHopper server Re-provision Oracle Cloud instance + ansible/manual setup 2 hours
Firebase project config Re-download from Firebase console 15 min

3. Failure Scenarios & Runbooks


Scenario A: Firebase Outage (Google-side)

Symptoms: App login fails, rides not syncing, Firestore queries time out.

Impact: HIGH — users cannot log in, cloud sync broken. Local GPS tracking still works.

Runbook: 1. Check https://status.firebase.google.com — confirm it is a Google-side outage. 2. Post in Discord #announcements: "We are aware of a temporary cloud sync issue. Local tracking continues to work. No data is lost." 3. No code change needed — app is designed with offline-first Hive storage. 4. Monitor Firebase status page. 5. When restored: CloudSyncService will auto-sync on next app open. 6. If > 6 hours: post update in Discord.

Prevention: None beyond Google's own SLAs. App architecture (offline-first Hive) already mitigates impact.


Scenario B: Firebase Project Accidentally Deleted

Symptoms: Auth returns 404, Firestore returns 404 on all collections.

Impact: CATASTROPHIC — all user data lost if no Firestore export exists.

Runbook: 1. Immediately check Google Cloud console → Firebase → check 30-day project deletion window. 2. If within 30 days: request project restoration via Google Cloud Support. 3. Restore from latest Firestore export (see Backup Plan → Section 4). 4. Restore google-services.json and GoogleService-Info.plist from Oracle backup. 5. Re-deploy Firebase Functions if any were lost. 6. Push emergency app update disabling Firebase features until restored. 7. Notify users via Discord + push notification.

Prevention: - Enable Firebase project delete protection (Console → Project Settings → delete protection). - Daily Firestore export to Cloud Storage bucket in europe-west4. - Only nossiej@gmail.com has Owner role — no other Owner accounts.


Scenario C: Routing Server Down (GraphHopper on Oracle Cloud)

Symptoms: Navigation tab shows error, route planning fails. GPS tracking still works.

Impact: MEDIUM — navigation feature unavailable. All tracking/history/social unaffected.

Runbook: 1. SSH to server: ssh equitrail@100.126.14.49 - If SSH fails → Tailscale issue: check Tailscale status on Mac (tailscale status) - If Tailscale OK but SSH fails → Oracle Cloud may have rebooted, check OCI console 2. Check GraphHopper: curl http://localhost:8989/health 3. Restart if needed: sudo systemctl restart graphhopper 4. Check logs: sudo journalctl -u graphhopper -n 100 5. If disk full: df -h → clean old GH logs: sudo journalctl --vacuum-time=7d 6. If OOM: free -m → increase swap or restart 7. If instance terminated: re-provision (see Appendix A)

Fallback (in app): RoutingService falls back to showing error toast. Navigation works in degraded mode (GPS track only, no turn-by-turn). No code change needed.

Future mitigation: Add secondary OSRM fallback in routing_service.dart (#101).


Scenario D: Android Keystore Lost

Symptoms: flutter build appbundle --release fails to sign, or keystore file missing.

Impact: CATASTROPHIC — cannot publish updates to Play Store without this exact keystore.

Runbook: 1. Check Oracle backup: ssh equitrail@100.126.14.49 "ls ~/equitrail-backup/latest/android/" 2. Restore: scp equitrail@100.126.14.49:~/equitrail-backup/latest/android/equitrail.jks android/ 3. Restore passwords from credentials.env backup. 4. Verify: keytool -list -v -keystore android/equitrail.jks -storepass VYdUKs9MbUH5mFeegSMB 5. Test build: flutter build appbundle --release

If keystore cannot be recovered: 1. Contact Google Play Support — they cannot restore it for you. 2. Must publish new app with new package name (e.g., com.nossie.equitrail2). 3. Notify all users to migrate. 4. This is a months-long effort — PREVENT THIS AT ALL COSTS.

Prevention: - Oracle backup runs every session exit (auto hook). - Manual offline copy: export android/equitrail.jks to password manager (1Password) or encrypted USB. - Run bash scripts/verify_backup.sh monthly to confirm backup integrity.


Scenario E: Azure DevOps Repository Unavailable

Symptoms: git push / git pull fails to dev.azure.com.

Impact: MEDIUM — code still locally available, no immediate app outage.

Runbook: 1. Check https://status.dev.azure.com — confirm Microsoft-side outage. 2. Continue working locally — all commits are safe in local .git/. 3. When restored: git push origin main 4. If Azure DevOps project deleted: restore from local clone (full history intact). 5. Long-term: push mirror to GitHub as secondary remote (see Appendix B).


Scenario F: Website Down (Plesk / equitrail.horse)

Symptoms: equitrail.horse returns error or blank page.

Impact: LOW — app functions fully without website. PRO purchase page affected.

Runbook: 1. Check if it's DNS: curl -I http://136.144.178.202 (direct IP) 2. If IP works but domain doesn't: DNS issue — check domain registrar DNS config. 3. If IP also fails: Plesk server issue — check Plesk control panel. 4. If blank page: 0-byte file issue — re-deploy: bash scripts/deploy_website.sh 5. Emergency: website static files are in website/ in git — can be hosted anywhere in minutes.

Nuclear fallback: Deploy website/ to Netlify/Vercel as temporary hosting in < 10 minutes.


Scenario G: Mac / Development Machine Lost or Stolen

Symptoms: Primary development machine unavailable.

Impact: HIGH for development velocity. No immediate user impact.

Runbook: 1. On any new Mac: brew install flutter git python3 2. Clone repo: git clone https://dev.azure.com/nossie/equitrail/_git/equitrail 3. Restore secrets from Oracle backup:

ssh equitrail@100.126.14.49
tar czf /tmp/equitrail-secrets.tar.gz ~/equitrail-backup/latest/
scp equitrail@100.126.14.49:/tmp/equitrail-secrets.tar.gz ~/
4. Restore Android keystore, credentials.env, google-services.json, GoogleService-Info.plist. 5. Run flutter pub get — development ready.

Time to full recovery: ~2 hours (Flutter install + restore + verify).


Scenario H: Discord Bot Down

Symptoms: Bot offline in Discord, onboarding not working, RouteHelper not responding.

Impact: LOW — app unaffected, community impacted.

Runbook: 1. SSH: ssh equitrail@100.126.14.49 2. Check status: systemctl --user status equitrail-bot 3. View logs: journalctl --user -u equitrail-bot -n 50 4. Restart: systemctl --user restart equitrail-bot 5. If crash loop: journalctl --user -u equitrail-bot -n 200 → fix code → deploy + restart 6. If Oracle server rebooted: loginctl enable-linger equitrail (run once to ensure auto-start)


4. Communication Plan

During an Incident

Severity Who to notify Channel Timing
CRITICAL (Firebase down, keystore lost) All users Discord #announcements + push notification Within 1 hour
HIGH (routing down > 1 hour) Discord Discord #announcements Within 2 hours
MEDIUM (website down) No user notification needed
LOW (Discord bot down) Post in #bot-commands Best effort

Message Templates

Firebase outage (Dutch):

⚠️ Tijdelijke storing — De cloud-sync is momenteel niet beschikbaar door een Google-storing. Je rit-registratie werkt normaal en al je gegevens zijn veilig bewaard op je toestel. We houden je op de hoogte. 🐴

Planned maintenance:

🔧 Onderhoud gepland — Op [datum] van [tijd] tot [tijd] is de navigatiefunctie tijdelijk niet beschikbaar. GPS-tracking en geschiedenis werken normaal.


5. Quarterly DR Test Checklist

Run every quarter (next: 2026-09-01):

  • Verify Oracle backup is running: ssh equitrail@100.126.14.49 "ls -la ~/equitrail-backup/"
  • Test keystore restore: copy from backup to /tmp/, run keytool -list
  • Verify Firestore export is scheduled and recent (Cloud Storage bucket)
  • Test routing server restart: sudo systemctl restart graphhopper
  • Verify website deploy: bash scripts/deploy_website.sh --file index.html
  • Test Discord bot restart: systemctl --user restart equitrail-bot
  • Confirm offline keystore copy is accessible (1Password / USB)
  • Review this document for accuracy

Appendix A: Re-provision Routing Server

If Oracle Cloud instance is terminated and must be recreated:

# 1. Create new Oracle Cloud ARM instance (Always Free tier)
#    Region: Netherlands (Amsterdam) — ap-amsterdam-1 or eu-amsterdam-1
#    OS: Ubuntu 22.04 LTS
#    Shape: VM.Standard.A1.Flex (ARM, free tier)

# 2. Install Tailscale
curl -fsSL https://tailscale.com/install.sh | sh
sudo tailscale up

# 3. Install Java + GraphHopper
sudo apt-get install -y openjdk-17-jdk
wget https://github.com/graphhopper/graphhopper/releases/download/9.1/graphhopper-web-9.1.jar
# ... (full setup in docs/routing_server_setup.md — TODO: create this)

# 4. Re-configure UFW
sudo ufw allow from 100.64.0.0/10  # Tailscale only
sudo ufw enable

Add a GitHub private repo as a secondary remote to protect against Azure DevOps outage:

# One-time setup
git remote add github https://github.com/nossiej/equitrail-private.git

# Push to both on each session
git push origin main && git push github main

# Or set up automatic mirror with Azure DevOps pipeline

Appendix C: Firestore Daily Export Setup

# Enable Cloud Firestore export via gcloud
gcloud firestore export gs://equitrail-backups/$(date +%Y%m%d) \
  --project=equitrail \
  --async

# Schedule via Cloud Scheduler (one-time setup in GCP console):
# Target: https://firestore.googleapis.com/v1/projects/equitrail/databases/(default):exportDocuments
# Schedule: 0 3 * * * (daily at 03:00 Amsterdam time)
# Body: {"outputUriPrefix": "gs://equitrail-backups/"}

Last reviewed: 2026-06-01 | Next review due: 2026-09-01