On Statistical Rates of Conditional Diffusion Transformers: Approximation, Estimation and Minimax Optimality

Conference

Jerry Yao-Chieh Hu*, Weimin Wu*, Yi-Chen Lee*, Yu-Chao Huang*, Minshuo Chen, Han Liu
International Conference on Learning Representations (ICLR) 2025

View PDF https://arxiv.org/abs/2411.17522

Cite

APA Click to copy
Hu*, J. Y.-C., Wu*, W., Lee*, Y.-C., Huang*, Y.-C., Chen, M., & Liu, H. On Statistical Rates of Conditional Diffusion Transformers: Approximation, Estimation and Minimax Optimality. In International Conference on Learning Representations (ICLR) 2025.

Chicago/Turabian Click to copy
Hu*, Jerry Yao-Chieh, Weimin Wu*, Yi-Chen Lee*, Yu-Chao Huang*, Minshuo Chen, and Han Liu. “On Statistical Rates of Conditional Diffusion Transformers: Approximation, Estimation and Minimax Optimality.” In International Conference on Learning Representations (ICLR) 2025, n.d.

MLA Click to copy
Hu*, Jerry Yao-Chieh, et al. “On Statistical Rates of Conditional Diffusion Transformers: Approximation, Estimation and Minimax Optimality.” International Conference on Learning Representations (ICLR) 2025.

BibTeX Click to copy

@conference{jerry-a,
  title = {On Statistical Rates of Conditional Diffusion Transformers: Approximation, Estimation and Minimax Optimality},
  journal = {International Conference on Learning Representations (ICLR) 2025},
  author = {Hu*, Jerry Yao-Chieh and Wu*, Weimin and Lee*, Yi-Chen and Huang*, Yu-Chao and Chen, Minshuo and Liu, Han}
}

Abstract:

We investigate the approximation and estimation rates of conditional diffusion transformers (DiTs) with classifier-free guidance. We present a comprehensive analysis for “in-context” conditional DiTs under four common data assumptions. We show that both conditional DiTs and their latent variants lead to the minimax optimality of unconditional DiTs under identified settings. Specifically, we discretize the input domains into infinitesimal grids and then perform a term-by-term Taylor expansion on the conditional diffusion score function under Hölder smooth data assumption. This enables fine-grained use of transformers’ universal approximation through a moredetailed piecewise constant approximation and hence obtains tighter bounds. Additionally, we extend our analysis to the latent setting under the linear latent subspace assumption. We not only show that latent conditional DiTs achieve lower bounds than conditional DiTs both in approximation and estimation, but also show the minimax optimality of latent unconditional DiTs. Our findings establish statistical limits for conditional and unconditional DiTs, and offer practical guidance toward developing more efficient and accurate DiT models.