Quantization Tutorial

12d

Skip Subscriptions, Set up Fast Local AI for Coding, Study, and Brainstorming

Learn how to run local AI models with LM Studio's user, power user, and developer modes, keeping data private and saving monthly fees.

IEEE

Quantization via Distillation and Contrastive Learning

Abstract: Quantization is a critical technique employed across various research fields for compressing deep neural networks (DNNs) to facilitate deployment within resource-limited environments. This ...

IEEE

DL-AQUA: Deep-Learning-Based Automatic Quantization for MMSE MIMO Detection

Abstract: Directly affecting both error performance and complexity, quantization is critical for MMSE MIMO detection. However, naively pruning quantization levels is ...

GitHub

SDNQ Quantization

SD.Next Quantization provides full cross-platform quantization to reduce memory usage and increase performance for any device. Triton enables the use of optimized kernels for much better performance.

GitHub

AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

Thanks to AWQ, TinyChat can deliver more efficient responses with LLM/VLM chatbots through 4-bit inference. TinyChat with LLaMA-3-8b on RTX 4090 (2.7x faster than FP16): TinyChat with LLaMA-3-8b on ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results