Architecture¶

cs-lewis-backend is a single-service Python application — Django + Django Ninja + Wagtail — running on AWS ECS Fargate, serving a Flutter mobile app and the editorial CMS from the same codebase. The backing data stores are Postgres (RDS), Redis (ElastiCache), and S3 with CloudFront edge cache for media.

This document describes what is built and how it is wired. The why behind each component choice lives in ADR-0001.

System Overview¶

                 Internet
                    │
              ┌─────▼─────┐
              │ Cloudflare│  (WAF + DDoS + rate limit)
              └─────┬─────┘
                    │
         ┌──────────┴──────────┐
         │                     │
         ▼                     ▼
   ┌──────────┐         ┌────────────┐
   │CloudFront│         │    ALB     │
   │ (media)  │         │            │
   └────┬─────┘         └─────┬──────┘
        │                     │
        ▼                     ▼
   ┌──────────┐    ┌──────────────────────┐
   │    S3    │    │  ECS Fargate         │
   │  audio + │    │  ┌──────────────┐    │
   │  images  │    │  │ Django (web) │    │  ← Django Ninja API
   └──────────┘    │  └──────────────┘    │    + Wagtail Admin
                   │  ┌──────────────┐    │    + Django Admin
                   │  │ Celery (jobs)│    │
                   │  └──────────────┘    │
                   └─────┬───────────┬────┘
                         │           │
                         ▼           ▼
                   ┌──────────┐ ┌────────────┐
                   │   RDS    │ │ElastiCache │
                   │ Postgres │ │   Redis    │
                   └──────────┘ └────────────┘
                                      │
                              ┌───────▼────────┐
                              │   External     │
                              │  • Anthropic   │
                              │  • ElevenLabs  │
                              │  • Substack    │
                              │  • Sentry      │
                              └────────────────┘

A single web tier sits behind Cloudflare (WAF + DDoS + rate limit) for the API and CloudFront for media. CDN-fronting of the API is deferred to Phase 2 — Redis at origin is sufficient at MVP scale. Celery workers run as a separate Fargate service so background workloads cannot starve the API. Postgres holds the content graph and user data; Redis caches API responses and brokers Celery jobs; S3 holds binary media.

Services & Components¶

Django web service (ECS Fargate)¶

Process: gunicorn + uvicorn workers serving the Django application.
Surfaces inside this process:
Django Ninja API at /api/v1/* — Flutter mobile contract, async views, auto-generated OpenAPI.
Wagtail Admin at /cms/* — editor authoring, StreamField content, scheduled publishing, workflows.
Django Admin at /admin/* — user management, reflection moderation queue (Phase 2), operational tooling.
Service layer: Wireup is the DI container wiring the API → service layer. All swappable services (Ranker, AI clients, ElevenLabs, Substack sync) are registered once in the container; views and Celery tasks declare their dependencies by signature and receive them via @inject / @inject_app. This is what makes the Postgres GIN → Meilisearch Ranker swap mechanical — one registration change, zero call-site changes.
Scaling: horizontal on Fargate; ALB fan-out to multiple tasks.
Reads from: Postgres (Django ORM), Redis (cache).
Writes to: Postgres, Redis (cache writes), Celery queue (job enqueue), Sentry (errors).

Celery worker service (ECS Fargate)¶

Process: Celery workers and a Celery Beat scheduler, deployed as separate Fargate services.
Jobs handled at MVP:
AI tag drafting on content ingest (Anthropic SDK).
ElevenLabs audio pre-generation per published passage.
Substack sync polling.
Scheduled publishing for Daily Drop via Celery Beat.
Phase 2 additions: Meilisearch reindex on publish; receipt validation hooks; semantic-similarity index updates if pgvector is adopted.
Scaling: horizontal; queue depth drives autoscaling.

Postgres (AWS RDS, managed)¶

Holds: Wagtail content (pages, snippets, StreamField payloads), Django models (users, reflections, daily drops, audit), tag arrays with GIN indexes for overlap queries.
Configuration: Postgres 16, multi-AZ in prod, single-AZ in staging, automated daily snapshots, PITR (35-day window in prod).
Extensions: pg_trgm for fuzzy text matching where useful; planned pgvector for Phase 2 semantic similarity if it earns its place.

Redis (AWS ElastiCache, managed)¶

Holds: API response cache, Celery broker, Celery result backend (short TTL), JWT denylist and refresh-token revocation, rate-limit counters.
Configuration: Redis 7, single-shard at MVP, multi-AZ replica in prod.

S3 + CloudFront¶

Holds: ElevenLabs-generated audio files, AI-generated featured images, editor uploads.
Access: read-public at MVP (all content is guest-accessible). Signed URLs in Phase 2 for tier-gated audio.
Cache: CloudFront edge in front of S3; long TTL on immutable content keyed by content hash.

Request Flow¶

Guest read — today's passage¶

Mobile client → WAF → ALB → Django Ninja.
Redis cache lookup on today:{date}:{register}. HIT returns immediately.
MISS: Postgres query for today's DailyDrop joined to its Passage → Pydantic response → cache write → response.
Mobile fetches audio + featured image directly via CloudFront URLs.

Guest read — dive deeper from a passage¶

Mobile → WAF → ALB → Django Ninja.
Ranker.related_to(passage_id, filters) is invoked.

At MVP, the ranker is backed by Postgres GIN array overlap:

SELECT id, label,
       array_length(motif_tags & :seed_motifs, 1) AS overlap
FROM passage
WHERE motif_tags && :seed_motifs
  AND id != :seed_id
  AND register = ANY(:register_filter)
ORDER BY overlap DESC
LIMIT 20;

Response cached under related:{passage_id}:{filters_hash}; TTL tuned to publish cadence.
Phase 2 swap: the Ranker implementation moves to Meilisearch without changing call-sites.

Editor publish — passage with AI-drafted tags¶

Editor authors a passage in Wagtail admin and saves a draft.
Post-save signal enqueues draft_tags_for_passage(passage_id) on Celery.
Worker calls Anthropic SDK with the passage text + tag taxonomy prompt → returns drafted tags with confidence scores.
Worker writes the draft tags to the passage (review_status=draft) and surfaces a "needs review" indicator in the admin.
Editor reviews, edits, flips workflow state to reviewed.
Wagtail's workflow gate allows publish. On publish, a post-publish signal:
Invalidates the relevant Redis cache keys.
Enqueues generate_audio_reading(passage_id) → ElevenLabs → S3.
(Phase 2) Enqueues reindex_passage_in_search(passage_id).

Data Stores¶

Store	Holds	Backup / DR
Postgres (RDS)	Wagtail content, Django models, tag indexes	Automated daily snapshots + PITR (35-day window in prod); multi-AZ failover
Redis (ElastiCache)	Cache, Celery broker, sessions, rate-limit counters	Daily snapshot to S3; cache loss tolerated (rebuilt on miss); broker loss requires job replay
S3	Media (audio, images, uploads)	Object versioning enabled; cross-region replication TBD for prod

See docs/cms-architecture.md for the content data model and collection schemas.

Network Topology¶

VPC: dedicated VPC per environment with public + private subnets across two AZs.
Public subnets: ALB, NAT gateway.
Private subnets: ECS tasks (web + Celery), RDS, ElastiCache.
Security groups: ALB → ECS (443); ECS → RDS (5432); ECS → Redis (6379); ECS → S3 via VPC gateway endpoint (no NAT egress for S3).
Egress: outbound to Anthropic, ElevenLabs, Substack, Sentry via NAT gateway.
DNS: Route 53; domain to be confirmed with client.
TLS: ACM-issued certs on ALB and CloudFront; TLS 1.2+ enforced.

Environments¶

Environment	Purpose	Differences from prod
Local (dev)	Developer workstation	Docker Compose: Django + Postgres + Redis containers; LocalStack or direct S3 for media
Staging	Pre-prod testing, editor preview, client review	Single-AZ RDS, smaller Fargate task sizes, no autoscaling, open or no WAF, separate Sentry environment
Production	Public launch (~Oct 2026)	Multi-AZ RDS with PITR, autoscaling ECS, WAF enabled, full Sentry, CloudFront enabled

Infrastructure as Code¶

	Choice	Notes
Tool	Terraform (proposed)	AWS-aligned stack; widely supported; team familiarity
Layout	Per-stack modules: `network/`, `compute/`, `data/`, `secrets/`, `edge/`	Each module independently planned and applied
Location	`infra/` in this repo	Single repo keeps app code and infra moving together; PRs can span both

Detailed IaC choices land in deployment.md once infrastructure is provisioned.

External Dependencies¶

Service	Used for	Phase	Failure handling
Anthropic / OpenAI	AI tag drafting on ingest	MVP	Celery retry with exponential backoff; failures logged to Sentry; editor can re-trigger from admin
ElevenLabs	Audio pre-generation (cloned voice)	MVP	Celery retry; failed jobs visible in admin; audio is non-blocking (text-only fallback)
Substack	Article sync into the CMS	MVP	Celery polling job; per-article idempotent insert; sync failures non-fatal
Bookstore affiliate links	Purchase handoff (source ladder)	MVP	Stored as field on `Work`; no runtime dependency
HarperCollins	Content rights (no Narnia audio)	MVP	Enforced via `Work.rights` flag — chapter audio gated at the API
App Store / Play Store	Apple/Google Sign-In, account deletion (V1); IAP (Phase 2)	V1 + Phase 2	Receipt validation server-side; account-deletion endpoint must be production-tested before App Store submission
Sentry	Error tracking	MVP	Soft dependency — application continues if Sentry unreachable
AWS (RDS, ElastiCache, S3, CloudFront, ECS, Secrets Manager)	Managed services	MVP	AWS SLAs; multi-AZ for prod data tier

Security Architecture¶

Concern	Mitigation
CSRF (admin)	Django CSRF middleware enabled on all admin POSTs
XSS	Django template auto-escaping; Wagtail/Draftail sanitisation; reflection input HTML-escaped before render
SQL injection	Django ORM parameterised queries; no raw SQL on user input
Secrets	AWS Secrets Manager / SSM Parameter Store; never in code or container images
Transport	TLS 1.2+ via ALB and CloudFront; HSTS headers set by Django
Rate limiting	Redis-backed (`django-ratelimit`) on write endpoints; WAF rate limit on the public API
Auth (V1)	`django-allauth` (headless) + Apple/Google Sign-In + JWT; password hashing via Django defaults (Argon2/PBKDF2)
Account deletion (V1)	Dedicated endpoint with hard-delete + anonymise strategy per FK; must be production-tested before App Store submission
PII handling	V1 PII limited to email + pseudonym (accounts ship in V1); encryption at rest via RDS
Audit	Wagtail page history on content edits; Django admin log on user actions; structured request logs

Scaling Characteristics¶

Tier	At MVP	Headroom
ALB	Auto-managed	—
Django web (Fargate)	2 tasks; autoscale on CPU + request count	Horizontal scale to dozens; Phase 2 may split read/write tiers
Celery workers	2 tasks per queue; autoscale on queue depth	Add dedicated queues per job type as needed
Postgres	`db.t4g.medium` (single in staging, multi-AZ in prod)	Vertical scale first; read replicas in Phase 2 if needed
Redis	`cache.t4g.micro`	Vertical scale first; cluster mode in Phase 2 if needed
S3 / CloudFront	Auto-managed	—

Performance Targets¶

Surface	p95 target	Notes
Today's Passage API	< 100 ms (cache hit) / < 300 ms (cache miss)	Cache TTL aligned to Daily Drop cadence
Dive-deeper API	< 200 ms (cache hit) / < 500 ms (cache miss)	Postgres GIN at MVP; Meilisearch swap if p95 degrades
Editor publish → reader visibility	< 30 s	Cache invalidation + downstream Celery jobs
Audio playback start	< 1 s	CloudFront-cached MP3/AAC; pre-generated, not on-demand

Open Items¶

Domain / DNS — to be confirmed with client.
Account-deletion strategy (V1) — FK on-delete strategies (cascade vs anonymise) to be defined in the first sprint; must be production-tested before App Store submission.
Passage vs Article modeling — to resolve in docs/cms-architecture.md before first sprint commits.

docs/adr/0001-backend-architecture.md — decision rationale, considered alternatives, production credibility
docs/cms-architecture.md — content data model the Wagtail/Django models encode
docs/guides/deployment.md — deployment pipeline, IaC layout, rollback
docs/api-overview.md — API conventions (auth, errors, pagination, caching)
Internal working memory (.context/) — product domain, scope, constraints