
聊聊Sparse Autoencoder对于LLM解释性的重塑 - 知乎
稀疏自编码器(Sparse Autoencoder, SAE) 矩阵分解效率低的原因是“正交基”的假设非常强,所以 SAE在尝试优化一个更软的约束条件:基向量在数据上的分布尽可能稀疏。
使用稀疏自编码器(Sparse Autoencoders, SAEs)提升语言模型 …
以下是基于论文《Sparse Autoencoders Find Highly Interpretable Features in Language Models》(arXiv:2309.08600v3)的实验代码实现,涵盖训练稀疏自编码器(Sparse Autoencoder, …
A Survey on Sparse Autoencoders: Interpreting the Internal …
Mar 7, 2025 · Among various mechanistic interpretability approaches, Sparse Autoencoders (SAEs) have emerged as a promising method due to their ability to disentangle the complex, …
Sparse Autoencoders in Deep Learning - GeeksforGeeks
Nov 27, 2025 · To learn efficient data representations with minimal redundancy, Sparse Autoencoders play an important role in deep learning. They are a special type of autoencoder …
These notes describe the sparse autoencoder learning algorithm, which is one approach to automatically learn features from unlabeled data.
We develop a state-of-the-art methodology to reliably train extremely wide and sparse autoencoders with very few dead latents on the activations of any language model. We …
【机器学习】SAE (Sparse Autoencoders)稀疏自编码器 - CSDN博客
Jun 13, 2025 · SAE (Sparse Autoencoders)稀疏自编码器 0.引言 大模型 一直被视为一个“黑箱”,研究人员对其内部神经元如何相互作用以实现功能的机制尚不清楚。 因此研究机理 可解释 …
Sparse Autoencoder for Mechanistic Interpretability - GitHub
A sparse autoencoder model, along with all the underlying PyTorch components you need to customise and/or build your own: Encoder, constrained unit norm decoder and tied bias …
浅谈Sparse Auto Encoder - 知乎
SAE (Sparse Auto Encoder)作为Auto Encoder的一种,通过稀疏化激活的方式,常被用于从大语言模型中提取可解释的特征。 但最近 cocomix 等一系列工作的出现又揭示了SAE作为Auto …
An Intuitive Explanation of Sparse Autoencoders for LLM ...
Jun 11, 2024 · A sparse autoencoder transforms the input vector into an intermediate vector, which can be of higher, equal, or lower dimension compared to the input. When applied to …