Next-Acceleration-Scale Prediction for Autoregressive MRI Reconstruction arXiv:2605.19354v1
Visual Autoregressive Modeling Accelerated MRI Reconstruction On-Policy Information Distillation
Yilmaz Korkmaz1 Vishal M. Patel1
1 Johns Hopkins University, Baltimore, USA
{ykorkma1, vpatel36}@jhu.edu
Next-acceleration-scale prediction teaser

Overview. Instead of generating a residual latent pyramid from one image as in VAR, the proposed method induces a hierarchy before encoding by applying MRI-native Fourier undersampling at different acceleration factors. This turns accelerated MRI reconstruction into next-acceleration-scale prediction from sparse measurements toward the fully sampled acquisition.


Abstract

MRI reconstruction is an inherently ill-posed inverse problem, since incomplete measurements admit many plausible solutions. This ambiguity becomes more severe under high acceleration, where pixel-domain continuous predictors tend to average over feasible reconstructions and suppress high-frequency anatomy. We address this limitation by moving reconstruction to discrete multi-scale latent space and posing it as autoregressive next-acceleration-scale prediction. Leveraging discrete priors proven effective in visual autoregressive modeling, our method restricts the solution to compact sequences of codebook tokens, enabling sharp reconstructions even from extremely sparse measurements. This discrete autoregressive formulation also aligns naturally with modern large language model post-training techniques. Building on this observation, we introduce on-policy privileged information distillation for visual autoregressive modeling, where a teacher is provided training-only privileged context that is unavailable at inference, in our case fully sampled acquisitions, and supervises a student trained on its own rollouts, leading to consistent reconstruction gains. Through extensive experiments on the fastMRI benchmark, we show that our approach delivers improved reconstruction performance across diverse sampling patterns under extreme undersampling.

Method

The framework combines an additive multi-input AQ-VAE, a cross-attentive autoregressive transformer, and an on-policy privileged distillation stage. The latent hierarchy follows acceleration scales 32x, 16x, 8x, 4x, 2x, and fully sampled, so the model learns to predict progressively less undersampled token maps from the sparse input sequence.
AQ-VAE architecture

AQ-VAE architecture. A label-conditioned multi-input tokenizer encodes acceleration-specific MRI inputs, quantizes them with a shared codebook, and fuses their latent contributions before decoding.

Cross-attentive autoregressive transformer

Cross-attentive transformer. The 16-block transformer preserves VAR-style self-attention while injecting AQ-VAE encoder features at 16x16, 32x32, and 64x64 resolutions to guide high-fidelity next-scale token prediction.


On-Policy Privileged Information Distillation

Distillation is the post-training stage: the student is optimized on the exact rollout states it visits during generation, while a privileged teacher provides sharper supervision using training-only continuous features from the fully sampled image.

On-policy privileged information distillation
Student rollouts The student autoregressively generates the latent token sequence, using each sampled acceleration-scale token set as input for the next prediction.
Privileged teacher The teacher receives the same generated token sequence plus continuous features extracted from the fully sampled MR image.
Reverse KL objective Mode-seeking distillation discourages student probability mass on unsupported anatomical predictions.

BibTeX

@article{korkmaz2026nextacceleration,
  title   = {Next-Acceleration-Scale Prediction for Autoregressive MRI Reconstruction},
  author  = {Korkmaz, Yilmaz and Patel, Vishal M.},
  journal = {arXiv preprint arXiv:2605.19354},
  year    = {2026}
}