Overview. Instead of generating a residual latent pyramid from one image as in VAR, the proposed method induces a hierarchy before encoding by applying MRI-native Fourier undersampling at different acceleration factors. This turns accelerated MRI reconstruction into next-acceleration-scale prediction from sparse measurements toward the fully sampled acquisition. |
Abstract |
Method |
AQ-VAE architecture. A label-conditioned multi-input tokenizer encodes acceleration-specific MRI inputs, quantizes them with a shared codebook, and fuses their latent contributions before decoding. |
Cross-attentive transformer. The 16-block transformer preserves VAR-style self-attention while injecting AQ-VAE encoder features at 16x16, 32x32, and 64x64 resolutions to guide high-fidelity next-scale token prediction. |
On-Policy Privileged Information DistillationDistillation is the post-training stage: the student is optimized on the exact rollout states it visits during generation, while a privileged teacher provides sharper supervision using training-only continuous features from the fully sampled image.
Student rollouts
The student autoregressively generates the latent token sequence, using each sampled acceleration-scale token set as input for the next prediction.
Privileged teacher
The teacher receives the same generated token sequence plus continuous features extracted from the fully sampled MR image.
Reverse KL objective
Mode-seeking distillation discourages student probability mass on unsupported anatomical predictions.
|
BibTeX
|