Topic
model compression
New Drift-RAE Method Distills Transformers Efficiently Using Representation Autoencoders
A new research paper proposes Drift-RAE, a method for distilling pretrained flow models in representation autoencoder latent spaces. It overcomes anisotropy and large curvature challenges, achieving 1.77 FID on ImageNet 256 with only 10,000 distillation steps, outperforming existing RAE distillation methods.
Lightweight Hardware-Aware Neural Architecture Search Enables CNNs on Ultra-Low-Power Microcontrollers
A new hardware-aware neural architecture search (HW-NAS) method generates tiny convolutional neural networks (CNNs) suitable for ultra-low-power microcontrollers, using a lightweight search procedure that can execute on embedded devices. Empirical results on three tiny computer vision benchmarks show it preserves state-of-the-art classification accuracy, addressing the power limitations of sensing nodes.
New Automated Quantization Framework AQ4SViT Compresses Spiking Vision Transformers for Embedded AI
Researchers propose AQ4SViT, an automated quantization framework for Spiking Vision Transformers that uses a search gating policy to find optimal compression settings. It offers two variants: Greedy search for speed and Beam search for deeper compression. Experimental results on ImageNet show up to 6.6x faster search time and up to 90% memory savings while maintaining accuracy within 1.5% of the original model.