Open Source AI Model Focused on Video Creation: Pyramid Flow

The artificial intelligence model focused on video creation, Pyramid Flow, offers high-quality video clips lasting up to 10 seconds. It was developed by Kuaishou Technology in collaboration with researchers from Peking University and Beijing University of Posts and Telecommunications, who brought the AI video production platform Kling AI to life.

Details of the Model

Pyramid Flow is built upon the concept of Pyramidal Flow Matching and utilizes a novel technique. In this approach, a single AI model progressively generates video. While most produced videos are of low resolution, the model only saves a fully-resolution version at the end of the production process. The proposed pyramidal flow reduces the token count by four times compared to traditional diffusion models, resulting in more efficient training. Additionally, the model can compress and optimize video production at different stages, enabling Pyramid Flow to achieve faster convergence during training and generate more examples per training group. You can learn more about the concept of pyramidal flow matching in the detailed paper titled Pyramidal Flow Matching for Efficient Video Generative Modeling.

Training Data

The model is trained on open-source datasets, producing videos at a 768p resolution and 24 frames per second for lengths between 5 and 10 seconds. The datasets used for training include LAION-5B, a large dataset for multimodal AI research; CC-12M, a dataset of image-text pairs scraped from the web; SA-1B, which contains high-quality, non-blurry images; and widely used video datasets for text-to-video generation such as WebVid-10M and OpenVid-1M.

The researchers mention that they have curated approximately 10 million single-shot videos in total. However, the openness of these datasets poses challenges, such as issues of copyright infringement and the potential for generating illegal content.

During inference, the model can produce a 5-second, 384p video in just 56 seconds. Compared to other diffusion models, it exhibits equal or even superior performance. Nonetheless, Runway’s Gen 3-Alpha Turbo has demonstrated the ability to produce videos in under one minute, often in 10-20 seconds, setting a high standard for AI video generation speed. On the other hand, the open-source Pyramid Flow poses competition to subscription-based models such as Runway’s Gen-3 Alpha, Luma’s Dream Machine, Kling, and Haulio.

However, it should be noted that Pyramid Flow has some limitations. The model lacks certain advanced fine-tuning capabilities found in models like Runway Gen-3 Alpha, which allow for precise control over cinematic elements such as camera angles, keyframes, and human movements.

The model’s raw code can be downloaded from Hugging Face and Github. Additionally, the model can be run in an inference shell, although users must download and execute the model code on their own machines for it to function in this manner. Published under the MIT License, Pyramid Flow offers a wide range of uses, including commercial applications, modifications, and redistribution, as long as the copyright notice is preserved. Furthermore, all code and model weights will be made available to users for free through official project pages.

Source

Arwen Volkov

Arwen Volkov, A graduate of the University of St. Gallen in Switzerland with a degree in International Finance, Arwen specializes in sustainable finance and green investments. She began her career at an investment bank in London, where she developed financing models for environmentally friendly projects. Known for her analytical and strategic thinking skills, Arwen is a sought-after financial consultant. In her spare time, she mentors fintech startups, contributing to their growth strategies. She is also a nature enthusiast and an amateur photographer.

Details of the Model

Training Data

Share Article:

TATEN’s Undiluted Potential Exceeded 1100%

On the Central-Atlantic Railway, a deficit business gets a second chance