Focus-dLLM paper accepted by ACL 2026

👏 Paper title: Focus-dLLM: Accelerating Long-Context Diffusion LLM Inference via Confidence-Guided Context Focusing. We propose Focus-dLLM, a training-free attention sparsification framework for long-context diffusion LLM inference. With confidence-guided context focusing and sink-aware pruning, Focus-dLLM reduces redundant bidirectional attention while preserving important attention sinks. This improves long-context dLLM inference efficiency without requiring model retraining.

GitHub: Longxmas/Focus-dLLM