Skip to main content
Founder, AI / Backend Engineer2026
#Python#asyncio#Typer#Pydantic#Claude#fal.ai#Replicate#ElevenLabs#ffmpeg

Aginx

A content factory: from a text brief it automatically assembles a finished vertical video up to 60 seconds for Reels / Shorts / TikTok. An LLM writes the scene-by-scene script with a hook and voice direction, while the pipeline generates video, voice, music and subtitles and stitches them into one file via ffmpeg. One loop: brief → script → media → cut.

A production experiment: can shooting a short video collapse into a single "brief → finished file" loop, where the LLM acts as director and generative models as the crew. No manual editing, repeatable, with fallbacks.

Context

Personal branding and content marketing hit the same wall: the idea is there, but shooting, editing, voiceover and subtitles eat hours. Aginx removes the manual labor from that chain — a short free-form brief goes in, a finished vertical 9:16 video for Reels, Shorts or TikTok comes out.

How it works

A four-stage loop, each an isolated step:

  • Orchestrator — Claude takes the brief and returns a strict JSON script: timed scenes, visual prompts (English, cinematic), narration (Russian), mood, camera motion, a music prompt and an opening hook.
  • Visual — by content type: broll (per-scene video), slideshow (images with motion) or talking_head (an animated avatar speaking the narration).
  • Audio — voice synthesis and background music; narration is generated before the paid video step, so no budget is burned if TTS fails.
  • Assembly — ffmpeg stitches the scenes, mixes voice and music, burns in ASS subtitles and outputs a single .mp4.
PythonasyncioTyperPydanticClaudefal.aiReplicateElevenLabsffmpeg

Engineering frame

  • One source of truth — the script. The LLM returns typed JSON against a Pydantic schema; everything downstream reads structure, not free text. Less ambiguity, easier to debug.
  • Providers with fallback. MediaProvider calls fal.ai first and switches to Replicate on error; voice via ElevenLabs / MiniMax. Models are a config parameter, not a hard-wired dependency.
  • Idempotency and resume. The script and intermediate files are written to disk immediately; an interrupted job can be resumed — finished scenes are skipped, not recomputed.
  • Order for the sake of cost. Cheap and risky steps run before expensive ones: script and voice first, paid video generation last.

Modes and control

  • Three content types: broll, talking_head, slideshow — the orchestrator picks one or you pin it.
  • Typer CLI: brief as a string or JSON file, fine voice tuning (stability, speed, pitch, emotion), --dry-run to preview the script only, resume to finish a job.
  • Config, not code: models, resolution, fps, music volume and voice parameters live in config.yaml.

What I took away as an engineer

1 loop
brief → script → media → file
2 providers
fal.ai with fallback to Replicate
resume
finish interrupted jobs without recompute
  • The "LLM director + typed script + generative models" combo turns video production into a controllable pipeline rather than a chain of manual steps.
  • The hard part isn't generation but orchestration: stage ordering, fallbacks, idempotency and cost control matter more than the choice of any single model.
  • Aginx is the base for a content factory: the same scheme can be adapted to someone else's brand or product.