
How We Made Our First AI-Generated Production Video — In 100 Seconds
How We Made Our First AI-Generated Production Video — In 100 Seconds
We just published our first AI-generated video across X, Facebook, TikTok, and YouTube. It's 98 seconds long, has 16 scenes, custom voiceover, and was built entirely with an AI agent working alongside me.
Here's the full breakdown.
Watch It First
The video is also on YouTube, TikTok, X, and Facebook. Go watch it, then come back for the how.
The Challenge
I wanted to create a recap video showing everything I built in two weeks — 16 different screens, database schemas, cron jobs, trading simulations, security dashboards, and more. The catch? I wanted it to look production quality. Not a slideshow. Not a screen recording with a voiceover slapped on top. A real video.
The traditional approach:
- Hire a video editor (R5,000–20,000+)
- Spend 1–2 weeks going back and forth
- Hope they understand what you're trying to show
Instead, I used an AI agent to build it. Here's how.
The Stack
- HyperFrames — Video as code. HTML + CSS + GSAP animations rendered to MP4
- Hermes Agent — AI orchestrator managing the full pipeline
- Edge TTS (AndrewNeural) — Per-beat voiceover generation
- FFmpeg — Audio/video processing, compression, muxing
- WSL — Local rendering, no cloud GPU needed
Total cost: R0. No cloud rendering fees. No editor fees. No subscriptions.
The Process
Step 1: Capture Screenshots
Every screen in the video is a real screenshot from the Hermes OS Mission Control dashboard — Supabase schema, GitHub repo, cron jobs, activity logs, security hub, game dev tracker, social media dashboard, trading platform, and more. 16 beats, 16 real screens.
Step 2: Write Narration Scripts
Each beat got a short, punchy script — about 5–6 seconds of spoken text. The AI agent wrote the first drafts, and I edited them for tone and accuracy. The final narration covers:
- Database schema with 13 tables
- GitHub repository structure
- 30 automated cron jobs
- 1,000 activity log events
- Security audit scores (62→87, 45→92)
- Cybersecurity operations hub
- Game development tracker
- Social media dashboard
- Monte Carlo trading simulations
- Live trading metrics
- Mission Control overview
- Daily synthesis jobs
- GitHub commit stats
- AgenticBiz platform
- YouTube channel launch
Step 3: Generate & Sync Voiceover
Each beat's narration was recorded separately using Edge TTS. Then every clip was trimmed or padded to fit its exact time slot — some beats needed more time (like the Hush security hub and AgenticBiz), so we redistributed time from shorter beats.
The key insight: you can't just pad everything. If your audio is longer than your video slot, it bleeds into the next beat. We had to trim long recordings and pad short ones to hit exactly 98 seconds.
Step 4: Build the Composition
The HTML composition has 16 scene divs, each with a particle background, REC indicator, section label, headline, screenshot container, source URL, and tag captions. GSAP handles all the animations — scene transitions, label/headline entrances, screen scrolling, tag fade-ins, and the progress bar.
Step 5: Render
One command. 13 minutes on WSL. No GPU needed.
npx hyperframes render . -c index.html -o renders/output.mp4 --width 1080 --height 1920 --quality draft --fps 30 --no-browser-gpu
Step 6: Compress & Distribute
ffmpeg -crf 30 -preset fast -c:a aac -b:a 96k
Final size: 5.8MB. Small enough for every social platform.
What We Learned
1. Audio sync is everything. The number one issue we hit was voiceover running past video end times. The fix: trim AND pad every beat's audio to fit its exact slot. Never rely on padding alone.
2. Verify everything visually. We extracted frames from the rendered video and checked every single beat to confirm the correct screenshot was showing. Don't trust the HTML — verify the output.
3. Keep text short. Headings must be max 2 lines. Narration must fit the slot. If it doesn't fit, cut it shorter — don't slow down the pacing.
4. Consistent tag styling. Every beat uses the same pill-shaped tag pattern. Consistency makes it feel professional.
5. The AI agent is a collaborator, not a replacement. It handled the heavy lifting — TTS generation, audio stitching, composition management, rendering. But I made every creative decision — what to show, what to say, how to pace it.
What's Coming Next
This was just the beginning. Here's what's in the pipeline:
Real Voiceovers — The next version will use my actual voice, not TTS. Voice cloning so the narration sounds like me, not a synthetic voice.
Long-Form Landscape Videos — 16:9 format, longer runtime, deeper dives. Think 3–5 minute explainer videos, tutorials, and product demos.
More Short-Form Content — We've got the template locked in. Expect more 90-second recaps, feature highlights, and behind-the-scenes content.
Fully Automated Pipeline — The goal is to go from "idea" to "published video" with minimal human intervention. We're close.
The Bigger Picture
This isn't just about making videos. It's about what's possible when you combine AI agents with creative tools. A single person — with an AI agent — can now produce content that used to require a team of editors, animators, and voice actors.
That's the future we're building at AgenticBiz. Not just AI agents that talk — AI agents that create.
Ready to Build Your Own?
If you're a business owner, creator, or team lead — imagine having your own AI agent that can produce content like this for you. On demand. At scale.
Or check out the video on TikTok, YouTube, X, and Facebook.
Built with ❤️ by Akhil Pillay — AI Agent Practitioner & Educator, and the human behind AgenticBiz.
Akhil Pillay
AI Agent Practitioner & Educator
This post was written with Hermes — Akhil's AI agent — and vetted by Akhil before publishing.
Enjoyed this? Join the community for more AI agent insights.
