Abstract: Variational Autoencoder (VAE) aims to compress pixel data into low-dimensional latent space, playing an important role in OpenAI’s Sora and other latent video diffusion generation models.