AI-generated video production pipeline with HyperFrames

Behind the Build20 June 2026

How We Made Our First AI-Generated Production Video — In 100 Seconds

How We Made Our First AI-Generated Production Video — In 100 Seconds

We just published our first AI-generated video across X, Facebook, TikTok, and YouTube. It's 98 seconds long, has 16 scenes, custom voiceover, and was built entirely with an AI agent working alongside me.

Here's the full breakdown.

Watch It First

The video is also on YouTube, TikTok, X, and Facebook. Go watch it, then come back for the how.

The Challenge

I wanted to create a recap video showing everything I built in two weeks — 16 different screens, database schemas, cron jobs, trading simulations, security dashboards, and more. The catch? I wanted it to look production quality. Not a slideshow. Not a screen recording with a voiceover slapped on top. A real video.

The traditional approach:

Hire a video editor (R5,000–20,000+)
Spend 1–2 weeks going back and forth
Hope they understand what you're trying to show

Instead, I used an AI agent to build it. Here's how.

The Stack

HyperFrames — Video as code. HTML + CSS + GSAP animations rendered to MP4
Hermes Agent — AI orchestrator managing the full pipeline
Edge TTS (AndrewNeural) — Per-beat voiceover generation
FFmpeg — Audio/video processing, compression, muxing
WSL — Local rendering, no cloud GPU needed

Total cost: R0. No cloud rendering fees. No editor fees. No subscriptions.

The Process

Step 1: Capture Screenshots

Every screen in the video is a real screenshot from the Hermes OS Mission Control dashboard — Supabase schema, GitHub repo, cron jobs, activity logs, security hub, game dev tracker, social media dashboard, trading platform, and more. 16 beats, 16 real screens.

Step 2: Write Narration Scripts

Each beat got a short, punchy script — about 5–6 seconds of spoken text. The AI agent wrote the first drafts, and I edited them for tone and accuracy. The final narration covers:

Database schema with 13 tables
GitHub repository structure
30 automated cron jobs
1,000 activity log events
Security audit scores (62→87, 45→92)
Cybersecurity operations hub
Game development tracker
Social media dashboard
Monte Carlo trading simulations
Live trading metrics
Mission Control overview
Daily synthesis jobs
GitHub commit stats
AgenticBiz platform
YouTube channel launch

Step 3: Generate & Sync Voiceover

Each beat's narration was recorded separately using Edge TTS. Then every clip was trimmed or padded to fit its exact time slot — some beats needed more time (like the Hush security hub and AgenticBiz), so we redistributed time from shorter beats.

The key insight: you can't just pad everything. If your audio is longer than your video slot, it bleeds into the next beat. We had to trim long recordings and pad short ones to hit exactly 98 seconds.

Step 4: Build the Composition

The HTML composition has 16 scene divs, each with a particle background, REC indicator, section label, headline, screenshot container, source URL, and tag captions. GSAP handles all the animations — scene transitions, label/headline entrances, screen scrolling, tag fade-ins, and the progress bar.

Step 5: Render

One command. 13 minutes on WSL. No GPU needed.

npx hyperframes render . -c index.html -o renders/output.mp4 --width 1080 --height 1920 --quality draft --fps 30 --no-browser-gpu

Step 6: Compress & Distribute

ffmpeg -crf 30 -preset fast -c:a aac -b:a 96k

Final size: 5.8MB. Small enough for every social platform.

What We Learned

1. Audio sync is everything. The number one issue we hit was voiceover running past video end times. The fix: trim AND pad every beat's audio to fit its exact slot. Never rely on padding alone.

2. Verify everything visually. We extracted frames from the rendered video and checked every single beat to confirm the correct screenshot was showing. Don't trust the HTML — verify the output.

3. Keep text short. Headings must be max 2 lines. Narration must fit the slot. If it doesn't fit, cut it shorter — don't slow down the pacing.

4. Consistent tag styling. Every beat uses the same pill-shaped tag pattern. Consistency makes it feel professional.

5. The AI agent is a collaborator, not a replacement. It handled the heavy lifting — TTS generation, audio stitching, composition management, rendering. But I made every creative decision — what to show, what to say, how to pace it.

What's Coming Next

This was just the beginning. Here's what's in the pipeline:

Real Voiceovers — The next version will use my actual voice, not TTS. Voice cloning so the narration sounds like me, not a synthetic voice.

Long-Form Landscape Videos — 16:9 format, longer runtime, deeper dives. Think 3–5 minute explainer videos, tutorials, and product demos.

More Short-Form Content — We've got the template locked in. Expect more 90-second recaps, feature highlights, and behind-the-scenes content.

Fully Automated Pipeline — The goal is to go from "idea" to "published video" with minimal human intervention. We're close.

The Bigger Picture

This isn't just about making videos. It's about what's possible when you combine AI agents with creative tools. A single person — with an AI agent — can now produce content that used to require a team of editors, animators, and voice actors.

That's the future we're building at AgenticBiz. Not just AI agents that talk — AI agents that create.

Ready to Build Your Own?

If you're a business owner, creator, or team lead — imagine having your own AI agent that can produce content like this for you. On demand. At scale.

Get your agent →

Or check out the video on TikTok, YouTube, X, and Facebook.

Built with ❤️ by Akhil Pillay — AI Agent Practitioner & Educator, and the human behind AgenticBiz.

Akhil Pillay

AI Agent Practitioner & Educator

This post was written with Hermes — Akhil's AI agent — and vetted by Akhil before publishing.

Enjoyed this? Join the community for more AI agent insights.

WhatsApp Community Get free AI advice

All posts

Behind the Build20 June 2026

How We Made Our First AI-Generated Production Video — In 100 Seconds

Here's the full breakdown.

Watch It First

The video is also on YouTube, TikTok, X, and Facebook. Go watch it, then come back for the how.

The Challenge

The traditional approach:

Hire a video editor (R5,000–20,000+)
Spend 1–2 weeks going back and forth
Hope they understand what you're trying to show

Instead, I used an AI agent to build it. Here's how.

The Stack

HyperFrames — Video as code. HTML + CSS + GSAP animations rendered to MP4
Hermes Agent — AI orchestrator managing the full pipeline
Edge TTS (AndrewNeural) — Per-beat voiceover generation
FFmpeg — Audio/video processing, compression, muxing
WSL — Local rendering, no cloud GPU needed

Total cost: R0. No cloud rendering fees. No editor fees. No subscriptions.

The Process

Step 1: Capture Screenshots

Step 2: Write Narration Scripts

Each beat got a short, punchy script — about 5–6 seconds of spoken text. The AI agent wrote the first drafts, and I edited them for tone and accuracy. The final narration covers:

Database schema with 13 tables
GitHub repository structure
30 automated cron jobs
1,000 activity log events
Security audit scores (62→87, 45→92)
Cybersecurity operations hub
Game development tracker
Social media dashboard
Monte Carlo trading simulations
Live trading metrics
Mission Control overview
Daily synthesis jobs
GitHub commit stats
AgenticBiz platform
YouTube channel launch

Step 3: Generate & Sync Voiceover

Step 4: Build the Composition

Step 5: Render

One command. 13 minutes on WSL. No GPU needed.

npx hyperframes render . -c index.html -o renders/output.mp4 --width 1080 --height 1920 --quality draft --fps 30 --no-browser-gpu

Step 6: Compress & Distribute

ffmpeg -crf 30 -preset fast -c:a aac -b:a 96k

Final size: 5.8MB. Small enough for every social platform.

What We Learned

1. Audio sync is everything. The number one issue we hit was voiceover running past video end times. The fix: trim AND pad every beat's audio to fit its exact slot. Never rely on padding alone.

2. Verify everything visually. We extracted frames from the rendered video and checked every single beat to confirm the correct screenshot was showing. Don't trust the HTML — verify the output.

3. Keep text short. Headings must be max 2 lines. Narration must fit the slot. If it doesn't fit, cut it shorter — don't slow down the pacing.

4. Consistent tag styling. Every beat uses the same pill-shaped tag pattern. Consistency makes it feel professional.

What's Coming Next

This was just the beginning. Here's what's in the pipeline:

Real Voiceovers — The next version will use my actual voice, not TTS. Voice cloning so the narration sounds like me, not a synthetic voice.

Long-Form Landscape Videos — 16:9 format, longer runtime, deeper dives. Think 3–5 minute explainer videos, tutorials, and product demos.

More Short-Form Content — We've got the template locked in. Expect more 90-second recaps, feature highlights, and behind-the-scenes content.

Fully Automated Pipeline — The goal is to go from "idea" to "published video" with minimal human intervention. We're close.

The Bigger Picture

That's the future we're building at AgenticBiz. Not just AI agents that talk — AI agents that create.

Ready to Build Your Own?

If you're a business owner, creator, or team lead — imagine having your own AI agent that can produce content like this for you. On demand. At scale.

Get your agent →

Or check out the video on TikTok, YouTube, X, and Facebook.

Built with ❤️ by Akhil Pillay — AI Agent Practitioner & Educator, and the human behind AgenticBiz.

Akhil Pillay

AI Agent Practitioner & Educator

This post was written with Hermes — Akhil's AI agent — and vetted by Akhil before publishing.

Enjoyed this? Join the community for more AI agent insights.

WhatsApp Community Get free AI advice