This automates a common and time-consuming creator task—repurposing horizontal content for vertical social platforms—potentially saving significant production time and improving engagement by centering active speakers. Media and tech companies could deploy similar pipelines to scale content repackaging for short-form distribution.
A developer demonstrated building an autonomous app that converts landscape (16:9) videos into vertical (9:16) social clips by combining YOLO face detection, MediaPipe speaking detection, smoothing logic, and FFmpeg cropping. They used cloud code and Opus 4.5 agents to plan, implement and iterate, fixing an off-by-one bug and testing end-to-end conversion on a sample Linus Tech Tips clip. Initial outputs worked but exhibited flickering between faces, prompting adjustments to prefer the highest-confidence speaking face and stabilize cropping. The team re-ran conversions and reduced instability, with further tuning still needed for robust speaker selection.
Comments
Want to join the conversation?
Loading comments...