Text-to-image generation powers content creation across design, media, and data augmentation.
Post-training of text-to-image generative models is a promising path to better match human
preferences, factuality, and improved aesthetics. We introduce SPARC (Self-Probing Adaptive
Reward by Confidence), a post-training framework that replaces external reward supervision with an
internal self-confidence signal, obtained by evaluating how accurately the model recovers injected
noise under self-denoising probes. SPARC converts this intrinsic signal into scalar rewards, enabling
fully unsupervised optimization without additional datasets, annotators, or reward models.
Empirically, by reinforcing high-confidence generations, SPARC delivers consistent gains in
compositional generation, text rendering and text-image alignment over the baseline.
We also find that integrating SPARC with external rewards results in a complementary improvement,
with alleviated reward hacking.