Skip to content

Architecture

cs-lewis-backend is a single-service Python application — Django + Django Ninja + Wagtail — running on AWS ECS Fargate, serving a Flutter mobile app and the editorial CMS from the same codebase. The backing data stores are Postgres (RDS), Redis (ElastiCache), and S3 with CloudFront edge cache for media.

This document describes what is built and how it is wired. The why behind each component choice lives in ADR-0001.

System Overview

                 Internet
              ┌─────▼─────┐
              │ Cloudflare│  (WAF + DDoS + rate limit)
              └─────┬─────┘
         ┌──────────┴──────────┐
         │                     │
         ▼                     ▼
   ┌──────────┐         ┌────────────┐
   │CloudFront│         │    ALB     │
   │ (media)  │         │            │
   └────┬─────┘         └─────┬──────┘
        │                     │
        ▼                     ▼
   ┌──────────┐    ┌──────────────────────┐
   │    S3    │    │  ECS Fargate         │
   │  audio + │    │  ┌──────────────┐    │
   │  images  │    │  │ Django (web) │    │  ← Django Ninja API
   └──────────┘    │  └──────────────┘    │    + Wagtail Admin
                   │  ┌──────────────┐    │    + Django Admin
                   │  │ Celery (jobs)│    │
                   │  └──────────────┘    │
                   └─────┬───────────┬────┘
                         │           │
                         ▼           ▼
                   ┌──────────┐ ┌────────────┐
                   │   RDS    │ │ElastiCache │
                   │ Postgres │ │   Redis    │
                   └──────────┘ └────────────┘
                              ┌───────▼────────┐
                              │   External     │
                              │  • Anthropic   │
                              │  • ElevenLabs  │
                              │  • Substack    │
                              │  • Sentry      │
                              └────────────────┘

A single web tier sits behind Cloudflare (WAF + DDoS + rate limit) for the API and CloudFront for media. CDN-fronting of the API is deferred to Phase 2 — Redis at origin is sufficient at MVP scale. Celery workers run as a separate Fargate service so background workloads cannot starve the API. Postgres holds the content graph and user data; Redis caches API responses and brokers Celery jobs; S3 holds binary media.

Services & Components

Django web service (ECS Fargate)

  • Process: gunicorn + uvicorn workers serving the Django application.
  • Surfaces inside this process:
  • Django Ninja API at /api/v1/* — Flutter mobile contract, async views, auto-generated OpenAPI.
  • Wagtail Admin at /cms/* — editor authoring, StreamField content, scheduled publishing, workflows.
  • Django Admin at /admin/* — user management, reflection moderation queue (Phase 2), operational tooling.
  • Service layer: Wireup is the DI container wiring the API → service layer. All swappable services (Ranker, AI clients, ElevenLabs, Substack sync) are registered once in the container; views and Celery tasks declare their dependencies by signature and receive them via @inject / @inject_app. This is what makes the Postgres GIN → Meilisearch Ranker swap mechanical — one registration change, zero call-site changes.
  • Scaling: horizontal on Fargate; ALB fan-out to multiple tasks.
  • Reads from: Postgres (Django ORM), Redis (cache).
  • Writes to: Postgres, Redis (cache writes), Celery queue (job enqueue), Sentry (errors).

Celery worker service (ECS Fargate)

  • Process: Celery workers and a Celery Beat scheduler, deployed as separate Fargate services.
  • Jobs handled at MVP:
  • AI tag drafting on content ingest (Anthropic SDK).
  • ElevenLabs audio pre-generation per published passage.
  • Substack sync polling.
  • Scheduled publishing for Daily Drop via Celery Beat.
  • Phase 2 additions: Meilisearch reindex on publish; receipt validation hooks; semantic-similarity index updates if pgvector is adopted.
  • Scaling: horizontal; queue depth drives autoscaling.

Postgres (AWS RDS, managed)

  • Holds: Wagtail content (pages, snippets, StreamField payloads), Django models (users, reflections, daily drops, audit), tag arrays with GIN indexes for overlap queries.
  • Configuration: Postgres 16, multi-AZ in prod, single-AZ in staging, automated daily snapshots, PITR (35-day window in prod).
  • Extensions: pg_trgm for fuzzy text matching where useful; planned pgvector for Phase 2 semantic similarity if it earns its place.

Redis (AWS ElastiCache, managed)

  • Holds: API response cache, Celery broker, Celery result backend (short TTL), JWT denylist and refresh-token revocation, rate-limit counters.
  • Configuration: Redis 7, single-shard at MVP, multi-AZ replica in prod.

S3 + CloudFront

  • Holds: ElevenLabs-generated audio files, AI-generated featured images, editor uploads.
  • Access: read-public at MVP (all content is guest-accessible). Signed URLs in Phase 2 for tier-gated audio.
  • Cache: CloudFront edge in front of S3; long TTL on immutable content keyed by content hash.

Request Flow

Guest read — today's passage

  1. Mobile client → WAF → ALB → Django Ninja.
  2. Redis cache lookup on today:{date}:{register}. HIT returns immediately.
  3. MISS: Postgres query for today's DailyDrop joined to its Passage → Pydantic response → cache write → response.
  4. Mobile fetches audio + featured image directly via CloudFront URLs.

Guest read — dive deeper from a passage

  1. Mobile → WAF → ALB → Django Ninja.
  2. Ranker.related_to(passage_id, filters) is invoked.
  3. At MVP, the ranker is backed by Postgres GIN array overlap:
    SELECT id, label,
           array_length(motif_tags & :seed_motifs, 1) AS overlap
    FROM passage
    WHERE motif_tags && :seed_motifs
      AND id != :seed_id
      AND register = ANY(:register_filter)
    ORDER BY overlap DESC
    LIMIT 20;
    
  4. Response cached under related:{passage_id}:{filters_hash}; TTL tuned to publish cadence.
  5. Phase 2 swap: the Ranker implementation moves to Meilisearch without changing call-sites.

Editor publish — passage with AI-drafted tags

  1. Editor authors a passage in Wagtail admin and saves a draft.
  2. Post-save signal enqueues draft_tags_for_passage(passage_id) on Celery.
  3. Worker calls Anthropic SDK with the passage text + tag taxonomy prompt → returns drafted tags with confidence scores.
  4. Worker writes the draft tags to the passage (review_status=draft) and surfaces a "needs review" indicator in the admin.
  5. Editor reviews, edits, flips workflow state to reviewed.
  6. Wagtail's workflow gate allows publish. On publish, a post-publish signal:
  7. Invalidates the relevant Redis cache keys.
  8. Enqueues generate_audio_reading(passage_id) → ElevenLabs → S3.
  9. (Phase 2) Enqueues reindex_passage_in_search(passage_id).

Data Stores

Store Holds Backup / DR
Postgres (RDS) Wagtail content, Django models, tag indexes Automated daily snapshots + PITR (35-day window in prod); multi-AZ failover
Redis (ElastiCache) Cache, Celery broker, sessions, rate-limit counters Daily snapshot to S3; cache loss tolerated (rebuilt on miss); broker loss requires job replay
S3 Media (audio, images, uploads) Object versioning enabled; cross-region replication TBD for prod

See docs/cms-architecture.md for the content data model and collection schemas.

Network Topology

  • VPC: dedicated VPC per environment with public + private subnets across two AZs.
  • Public subnets: ALB, NAT gateway.
  • Private subnets: ECS tasks (web + Celery), RDS, ElastiCache.
  • Security groups: ALB → ECS (443); ECS → RDS (5432); ECS → Redis (6379); ECS → S3 via VPC gateway endpoint (no NAT egress for S3).
  • Egress: outbound to Anthropic, ElevenLabs, Substack, Sentry via NAT gateway.
  • DNS: Route 53; domain to be confirmed with client.
  • TLS: ACM-issued certs on ALB and CloudFront; TLS 1.2+ enforced.

Environments

Environment Purpose Differences from prod
Local (dev) Developer workstation Docker Compose: Django + Postgres + Redis containers; LocalStack or direct S3 for media
Staging Pre-prod testing, editor preview, client review Single-AZ RDS, smaller Fargate task sizes, no autoscaling, open or no WAF, separate Sentry environment
Production Public launch (~Oct 2026) Multi-AZ RDS with PITR, autoscaling ECS, WAF enabled, full Sentry, CloudFront enabled

Infrastructure as Code

Choice Notes
Tool Terraform (proposed) AWS-aligned stack; widely supported; team familiarity
Layout Per-stack modules: network/, compute/, data/, secrets/, edge/ Each module independently planned and applied
Location infra/ in this repo Single repo keeps app code and infra moving together; PRs can span both

Detailed IaC choices land in deployment.md once infrastructure is provisioned.

External Dependencies

Service Used for Phase Failure handling
Anthropic / OpenAI AI tag drafting on ingest MVP Celery retry with exponential backoff; failures logged to Sentry; editor can re-trigger from admin
ElevenLabs Audio pre-generation (cloned voice) MVP Celery retry; failed jobs visible in admin; audio is non-blocking (text-only fallback)
Substack Article sync into the CMS MVP Celery polling job; per-article idempotent insert; sync failures non-fatal
Bookstore affiliate links Purchase handoff (source ladder) MVP Stored as field on Work; no runtime dependency
HarperCollins Content rights (no Narnia audio) MVP Enforced via Work.rights flag — chapter audio gated at the API
App Store / Play Store Apple/Google Sign-In, account deletion (V1); IAP (Phase 2) V1 + Phase 2 Receipt validation server-side; account-deletion endpoint must be production-tested before App Store submission
Sentry Error tracking MVP Soft dependency — application continues if Sentry unreachable
AWS (RDS, ElastiCache, S3, CloudFront, ECS, Secrets Manager) Managed services MVP AWS SLAs; multi-AZ for prod data tier

Security Architecture

Concern Mitigation
CSRF (admin) Django CSRF middleware enabled on all admin POSTs
XSS Django template auto-escaping; Wagtail/Draftail sanitisation; reflection input HTML-escaped before render
SQL injection Django ORM parameterised queries; no raw SQL on user input
Secrets AWS Secrets Manager / SSM Parameter Store; never in code or container images
Transport TLS 1.2+ via ALB and CloudFront; HSTS headers set by Django
Rate limiting Redis-backed (django-ratelimit) on write endpoints; WAF rate limit on the public API
Auth (V1) django-allauth (headless) + Apple/Google Sign-In + JWT; password hashing via Django defaults (Argon2/PBKDF2)
Account deletion (V1) Dedicated endpoint with hard-delete + anonymise strategy per FK; must be production-tested before App Store submission
PII handling V1 PII limited to email + pseudonym (accounts ship in V1); encryption at rest via RDS
Audit Wagtail page history on content edits; Django admin log on user actions; structured request logs

Scaling Characteristics

Tier At MVP Headroom
ALB Auto-managed
Django web (Fargate) 2 tasks; autoscale on CPU + request count Horizontal scale to dozens; Phase 2 may split read/write tiers
Celery workers 2 tasks per queue; autoscale on queue depth Add dedicated queues per job type as needed
Postgres db.t4g.medium (single in staging, multi-AZ in prod) Vertical scale first; read replicas in Phase 2 if needed
Redis cache.t4g.micro Vertical scale first; cluster mode in Phase 2 if needed
S3 / CloudFront Auto-managed

Performance Targets

Surface p95 target Notes
Today's Passage API < 100 ms (cache hit) / < 300 ms (cache miss) Cache TTL aligned to Daily Drop cadence
Dive-deeper API < 200 ms (cache hit) / < 500 ms (cache miss) Postgres GIN at MVP; Meilisearch swap if p95 degrades
Editor publish → reader visibility < 30 s Cache invalidation + downstream Celery jobs
Audio playback start < 1 s CloudFront-cached MP3/AAC; pre-generated, not on-demand

Open Items

  • Domain / DNS — to be confirmed with client.
  • Account-deletion strategy (V1) — FK on-delete strategies (cascade vs anonymise) to be defined in the first sprint; must be production-tested before App Store submission.
  • Passage vs Article modeling — to resolve in docs/cms-architecture.md before first sprint commits.