Aginx
A content factory: from a text brief it automatically assembles a finished vertical video up to 60 seconds for Reels / Shorts / TikTok. An LLM writes the scene-by-scene script with a hook and voice direction, while the pipeline generates video, voice, music and subtitles and stitches them into one file via ffmpeg. One loop: brief → script → media → cut.
Table of Contents
A production experiment: can shooting a short video collapse into a single "brief → finished file" loop, where the LLM acts as director and generative models as the crew. No manual editing, repeatable, with fallbacks.
Context
Personal branding and content marketing hit the same wall: the idea is there, but shooting, editing, voiceover and subtitles eat hours. Aginx removes the manual labor from that chain — a short free-form brief goes in, a finished vertical 9:16 video for Reels, Shorts or TikTok comes out.
How it works
A four-stage loop, each an isolated step:
- Orchestrator — Claude takes the brief and returns a strict JSON script: timed scenes, visual prompts (English, cinematic), narration (Russian), mood, camera motion, a music prompt and an opening hook.
- Visual — by content type:
broll(per-scene video),slideshow(images with motion) ortalking_head(an animated avatar speaking the narration). - Audio — voice synthesis and background music; narration is generated before the paid video step, so no budget is burned if TTS fails.
- Assembly — ffmpeg stitches the scenes, mixes voice and music, burns in ASS subtitles and outputs a single
.mp4.
Engineering frame
- One source of truth — the script. The LLM returns typed JSON against a Pydantic schema; everything downstream reads structure, not free text. Less ambiguity, easier to debug.
- Providers with fallback.
MediaProvidercalls fal.ai first and switches to Replicate on error; voice via ElevenLabs / MiniMax. Models are a config parameter, not a hard-wired dependency. - Idempotency and resume. The script and intermediate files are written to disk immediately; an interrupted job can be resumed — finished scenes are skipped, not recomputed.
- Order for the sake of cost. Cheap and risky steps run before expensive ones: script and voice first, paid video generation last.
Modes and control
- Three content types:
broll,talking_head,slideshow— the orchestrator picks one or you pin it. - Typer CLI: brief as a string or JSON file, fine voice tuning (stability, speed, pitch, emotion),
--dry-runto preview the script only,resumeto finish a job. - Config, not code: models, resolution, fps, music volume and voice parameters live in
config.yaml.
What I took away as an engineer
- The "LLM director + typed script + generative models" combo turns video production into a controllable pipeline rather than a chain of manual steps.
- The hard part isn't generation but orchestration: stage ordering, fallbacks, idempotency and cost control matter more than the choice of any single model.
- Aginx is the base for a content factory: the same scheme can be adapted to someone else's brand or product.
Similar projects
Projects with similar technologies and tasks
Archetype Code
A web app: from a name and birth date it assembles an «archetype card» — how a person makes decisions and which patterns they fall into. Five symbolic layers (numerology, astrology, psychomatrix, tarot) are synthesized into one portrait via an LLM. Free archetype + a paid report.
- Python
- FastAPI
- PostgreSQL
- Next.js
- TypeScript
- +1
Tech Path Finder
Educational platform for IT professionals with quizzes, mock interviews, code review, and personalized recommendations based on a knowledge decay algorithm
- Python
- FastAPI
- PostgreSQL
- Redis
- Kafka
- +5
Sociogenetics — Own Web Studio
My first web studio with its own office (2012–2014). Custom projects for government and Siemens. Closed after Siemens left Russia in 2014. A lesson in risk and diversification.
- Django
- Python
- PostgreSQL
- MySQL