blog

  • MDM-Prime-v2: Binary Encoding and Index Shuffling Enable Compute-optimal Scaling of Diffusion Language Models

    This blog post offers an introduction to MDM-Prime-v2, an improved version of MDM-Prime that enables compute-optimal scaling of diffusion language models. We show that the sbutokenizer of MDM-Prime is sub-optimal for likelihood estimation. By incorporating two simple techniques, binary encoding and index shuffling, MDM-Prime-v2 achieves 21.8x more efficient training than autoregressive models in the compute-optimal regime.

  • Beyond Masked and Unmasked: Discrete Diffusion Models via Partial Masking

    This blog post offers an introduction to MDM-Prime, a generalized masked diffusion model (MDM) that enables partially unmasked tokens during sampling. We begin with an review of MDMs and their limitations. Then, we explore a Partial masking scheme (Prime) that introduces intermediate token states between masked and unmasked representations. Finally, we present experimental results to demonstrate the effectiveness of MDM-Prime.

  • Maximum Entropy Reinforcement Learning via Energy-Based Normalizing Flow

    This blog post offers an introduction to our proposed MEow algorithm. We begin with an review of MaxEnt RL and EBFlow. Then, we explore the connections between these models by introducing MEow. Finally, we present experimental results to demonstrate the effectiveness of the proposed method.

  • Training Energy-Based Normalizing Flow with Score-Matching Objectives

    This blog post offers an introduction to our proposed EBFlow modeling method. First, we begin with an overview of flow-based and energy-based models. Then, we explore the connections between these models by introducing EBFlow. Next, we present experimental results to demonstrate the effectiveness of the proposed method. Finally, we discuss several implications of the EBFlow formula.

  • On Investigating the Conservative Property of Score-Based Generative Models

    This blog post provides an introduction to our proposed QCSBM modeling method. We first start with motivational examples that examine the influence of the conservative property of score-based models. Next, we outline a training pipeline designed to backpropagate the conservativeness through the model. Finally, we present experimental results to demonstrate the effectiveness of our method.