Home / Question Answering / Recent Language Models > MINILM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers Recent Language Models > MINILM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers , Wenhui Wang, et al., arXiv, 2020. Package GitHub Back to Question Answering