MediaMinds - AI Video Generation

Project Overview

Creating short-form video content from text using AI. Features text processing, summarization, audio narration, and synchronized visual elements.

Python OpenCV Stable Diffusion FFmpeg NLP Google TTS Matplotlib

Key Features

Developed a pipeline to generate videos from textual inputs using web scraping, NLP preprocessing, Stable Diffusion v1.5, and Google TTS, reducing manual video creation time by 35%.
Implemented image generation per text chunk, subtitle overlay using Matplotlib, and custom video rendering via OpenCV and FFmpeg to ensure frame-accurate audio-video synchronization.
Integrated optional voice transformation with a pretrained RVC model, enabling voice customization and pitch alteration.
Enabled end-to-end automation from article scraping to final video generation with transitions, subtitles, and aligned audio in under 3 minutes.

Technical Implementation

The project leverages advanced AI technologies to transform text content into engaging video presentations. The pipeline processes input text through multiple stages:

Text Processing: Web scraping and NLP preprocessing to extract and structure content
Image Generation: Stable Diffusion v1.5 creates relevant visuals for each text segment
Audio Synthesis: Google TTS converts text to speech with natural intonation
Video Assembly: OpenCV and FFmpeg combine images, audio, and subtitles into final video

Results & Impact

This capstone project successfully demonstrated the potential of AI-driven content creation, achieving a 35% reduction in video production time while maintaining high-quality output. The system can process articles and generate complete videos with synchronized audio and visual elements in under 3 minutes.