V2 Complete · Developer Tool · Python CLI

Talking-Head Footage
→ Viral 9:16
Automatically.

A Python CLI that converts raw video into optimized vertical clips — face tracking, punch zoom, word-level captions, and loudnorm audio, all from a single command.

$ clip-trimmer input.mp4 --output clip.mp4 --verbose
✓ Audio extracted — 00:04:12
✓ Speech segments detected — 94% speech ratio
✓ Whisper transcript — 847 words, word-level timestamps
✓ Face detected — tracking 1 subject
✓ Crop plan — 9:16 face-centred, 3 punch zones
✓ Captions built — 214 groups, active-word highlight
✓ Segments rendered — 14 segments, loudnorm pass
✓ Done — clip.mp4 (00:03:58) in 47.2s
TDD
Test-driven dev
V2
Current version
9:16
Output format
1 cmd
To run

One Command.
Six Stages.

Every step is deterministic, testable in isolation, and configurable via CLI flags.

01 / AUDIO
Extract & Detect Speech
WebRTC VAD strips silence. Only real speech regions go forward.
02 / TRANSCRIPT
Word-Level Timestamps
OpenAI Whisper produces per-word timestamps with model caching.
03 / FACE
Face Tracking
Detects and tracks the main subject for a face-centred 9:16 crop.
04 / EDIT
Crop & Punch Zoom Plan
Per-segment scale→crop pipeline with slow-zoom for engagement.
05 / CAPTIONS
ASS Caption Burn
3–5 word groups with active-word highlight — burned into the output.
06 / RENDER
FFmpeg Render & Loudnorm
Segments rendered and concatenated. Audio normalised to broadcast standard.

Built for Reliability.

Face-Centred Crop

Automatically centres the 9:16 crop on the detected face. Falls back to centre crop when no face is present.

Punch Zoom

Adds slow-zoom punch-in effects at strategic moments to increase retention without looking artificial.

ASS Captions with Highlight

Word-level timestamps from Whisper drive 3–5 word caption groups with an active-word highlight that tracks speech.

Silence Removal

WebRTC VAD detects and cuts silent sections. --silence-gap flag controls the minimum gap to cut.

Loudnorm Audio

EBU R128 loudness normalisation applied per-segment and on the final output. Consistent audio across all clips.

Test-driven, CI/CD

Unit tests cover every module. Automated CI runs on every push. Zero suppressed warnings.

Built on Proven Tools.

Python
CLI · core pipeline
Whisper
Word-level transcription
WebRTC VAD
Speech detection
FFmpeg
Video render + loudnorm
OpenCV
Face detection + tracking
Automated testing
Tests · linting · CI

V2 Complete.

All V2 modules are shipped. V3 (smart tracking, AI-guided cuts) is on the roadmap.

Core pipeline (audio → render)✓ V2 Complete
Whisper word-level timestamps✓ V2 Complete
Face tracking crop✓ V2 Complete
ASS captions + active word highlight✓ V2 Complete
55 unit tests + CI/CD✓ V2 Complete
E2E test on real clips→ In Progress
V3 — Smart tracking, AI-guided cutsRoadmap

Contact

Questions, feedback, or integration requests — reach out directly.

contact@lpagesapplabs.com