Building Blocks of Text to Video Generation

Deep Learning Course Final Project
Project Details

In this blogpost, we dissect and explain the mechanics behind the key building blocks for state-of-the-art Text-to-Video generation to a general audience. We provide detailed illustrations and interactive examples of these building blocks and demonstrate the key novelties/differences between two Text-to-Video models: Imagen Video and Make-a-Video. Finally, we summarize by showing how the building blocks fit together into a complete Text-to-Video framework as well as noting the current failure modes and limitations of the models today. Note that interactive page elements may take time to load.

