CI-Lab is a Research Group affiliated with the Institute of Advanced Computing Technology (ACT), School of Computer Science and Engineering (SCSE), Beihang University (BUAA).

👏 Paper title: Focus-dLLM: Accelerating Long-Context Diffusion LLM Inference via Confidence-Guided Context Focusing. We propose Focus-dLLM, a training-free attention sparsification framework for long-context diffusion LLM inference. With confidence-guided context focusing and sink-aware pruning, Focus-dLLM reduces redundant bidirectional attention while preserving important attention sinks. This improves long-context dLLM inference efficiency without requiring model retraining.
GitHub: Longxmas/Focus-dLLM
👏 Paper title: GCoDE: Efficient Device-Edge Co-Inference for GNNs via Architecture-Mapping Co-Search. We propose GCoDE, an architecture-mapping co-search framework for efficient device-edge GNN co-inference. It jointly explores GNN architectures and deployment mappings, balancing communication and computation across device-edge systems. The framework models how graph partitions, neural architectures, and system placement interact, so the search can avoid designs that are accurate but communication-heavy or efficient but accuracy-limited. By co-optimizing the model architecture and system mapping, GCoDE improves inference efficiency while maintaining task performance for device-edge GNN deployment.
👏 Paper title: MIREDO: MIP-Driven Resource-Efficient Dataflow Optimization for Computing-in-Memory Accelerator. We propose MIREDO, a mixed-integer-programming-driven framework for resource-efficient CIM dataflow optimization. By jointly modeling workloads, dataflow strategies, CIM constraints, and latency, MIREDO searches the design space for efficient accelerator configurations. This enables systematic selection of dataflows that better match CIM hardware resources and workload characteristics.
👏 Paper title: CIMinus: Empowering Sparse DNN Workloads Modeling and Exploration on SRAM-based CIM Architectures. We propose CIMinus, a cost modeling framework for sparse DNN workloads on SRAM-based CIM architectures. With the FlexBlock sparsity abstraction and an integrated pruning-to-evaluation workflow, CIMinus supports accurate design space exploration for speedup and energy savings.
👏 Paper title: TinyFormer: Efficient Sparse Transformer Design and Deployment on Tiny Devices. We propose TinyFormer, a framework for designing and deploying sparse transformer models on tiny devices. It combines architecture search, sparse model optimization, and automated deployment to improve inference efficiency under strict MCU resource constraints.
👏 Paper title: Towards Affordable, Adaptive and Automatic GNN Training on CPU-GPU Heterogeneous Platforms. We introduce A3GNN for adaptive GNN training on heterogeneous CPU-GPU platforms. It balances throughput, memory footprint, and accuracy through locality-aware sampling and fine-grained scheduling.
👏 Paper title: ACE-GNN: Adaptive GNN Co-Inference with System-Aware Scheduling in Dynamic Edge Environments. We present ACE-GNN, an adaptive GNN co-inference framework for dynamic edge environments. It performs runtime scheme optimization, schedules between pipeline and data parallelism, and uses efficient batching and communication middleware to improve performance and energy efficiency. This enables more robust device-edge GNN serving under changing system conditions.
👏 Paper title: CIMFlow: An Integrated Framework for Systematic Design and Evaluation of Digital CIM Architectures. We introduce CIMFlow, an integrated framework for digital compute-in-memory architectures. It bridges compilation and simulation with a flexible ISA for efficient deep learning execution.
👏 Paper title: Finesse: An Agile Design Framework for Pairing-based Cryptography via Software/Hardware Co-Design. We introduce Finesse, a software-hardware co-design framework for pairing-based cryptography. With a unified IR/ISA/hardware abstraction, a parameterized pipelined architecture, and an optimizing compiler, Finesse improves flexible accelerator throughput and competitiveness with specialized ASICs. The framework provides an agile path for generating high-performance cryptographic accelerators across pairing workloads.