Computational Intelligence Laboratory @ Beihang University

CI-Lab is a Research Group affiliated with the Institute of Advanced Computing Technology (ACT), School of Computer Science and Engineering (SCSE), Beihang University (BUAA).


Focus-dLLM paper accepted by ACL 2026

👏 Paper title: Focus-dLLM: Accelerating Long-Context Diffusion LLM Inference via Confidence-Guided Context Focusing. We propose Focus-dLLM, a training-free attention sparsification framework for long-context diffusion LLM inference. With confidence-guided context focusing and sink-aware pruning, Focus-dLLM reduces redundant bidirectional attention while preserving important attention sinks. This improves long-context dLLM inference efficiency without requiring model retraining.

GitHub: Longxmas/Focus-dLLM

Focus-dLLM paper accepted by ACL 2026
GCoDE paper accepted by IEEE TC

👏 Paper title: GCoDE: Efficient Device-Edge Co-Inference for GNNs via Architecture-Mapping Co-Search. We propose GCoDE, an architecture-mapping co-search framework for efficient device-edge GNN co-inference. It jointly explores GNN architectures and deployment mappings, balancing communication and computation across device-edge systems. The framework models how graph partitions, neural architectures, and system placement interact, so the search can avoid designs that are accurate but communication-heavy or efficient but accuracy-limited. By co-optimizing the model architecture and system mapping, GCoDE improves inference efficiency while maintaining task performance for device-edge GNN deployment.

GCoDE paper accepted by IEEE TC
MIREDO paper accepted by ASP-DAC 2026

👏 Paper title: MIREDO: MIP-Driven Resource-Efficient Dataflow Optimization for Computing-in-Memory Accelerator. We propose MIREDO, a mixed-integer-programming-driven framework for resource-efficient CIM dataflow optimization. By jointly modeling workloads, dataflow strategies, CIM constraints, and latency, MIREDO searches the design space for efficient accelerator configurations. This enables systematic selection of dataflows that better match CIM hardware resources and workload characteristics.

MIREDO paper accepted by ASP-DAC 2026
SlimInfer paper accepted by AAAI 2026

👏 Paper title: SlimInfer: Accelerating Long-Context LLM Inference via Dynamic Token Pruning. We propose SlimInfer for long-context LLM inference. It performs dynamic block-wise token pruning with a predictor-free asynchronous KV cache manager to reduce latency.

SlimInfer paper accepted by AAAI 2026
CIMinus paper accepted by IEEE TC

👏 Paper title: CIMinus: Empowering Sparse DNN Workloads Modeling and Exploration on SRAM-based CIM Architectures. We propose CIMinus, a cost modeling framework for sparse DNN workloads on SRAM-based CIM architectures. With the FlexBlock sparsity abstraction and an integrated pruning-to-evaluation workflow, CIMinus supports accurate design space exploration for speedup and energy savings.

CIMinus paper accepted by IEEE TC
TinyFormer paper accepted by IEEE TCAS-I

👏 Paper title: TinyFormer: Efficient Sparse Transformer Design and Deployment on Tiny Devices. We propose TinyFormer, a framework for designing and deploying sparse transformer models on tiny devices. It combines architecture search, sparse model optimization, and automated deployment to improve inference efficiency under strict MCU resource constraints.

TinyFormer paper accepted by IEEE TCAS-I
A3GNN paper accepted by ICCD 2025

👏 Paper title: Towards Affordable, Adaptive and Automatic GNN Training on CPU-GPU Heterogeneous Platforms. We introduce A3GNN for adaptive GNN training on heterogeneous CPU-GPU platforms. It balances throughput, memory footprint, and accuracy through locality-aware sampling and fine-grained scheduling.

A3GNN paper accepted by ICCD 2025
ACE-GNN paper accepted by IEEE TCAD

👏 Paper title: ACE-GNN: Adaptive GNN Co-Inference with System-Aware Scheduling in Dynamic Edge Environments. We present ACE-GNN, an adaptive GNN co-inference framework for dynamic edge environments. It performs runtime scheme optimization, schedules between pipeline and data parallelism, and uses efficient batching and communication middleware to improve performance and energy efficiency. This enables more robust device-edge GNN serving under changing system conditions.

ACE-GNN paper accepted by IEEE TCAD
CIMFlow paper accepted by DAC 2025

👏 Paper title: CIMFlow: An Integrated Framework for Systematic Design and Evaluation of Digital CIM Architectures. We introduce CIMFlow, an integrated framework for digital compute-in-memory architectures. It bridges compilation and simulation with a flexible ISA for efficient deep learning execution.

CIMFlow paper accepted by DAC 2025
Finesse paper accepted by ISCA 2025

👏 Paper title: Finesse: An Agile Design Framework for Pairing-based Cryptography via Software/Hardware Co-Design. We introduce Finesse, a software-hardware co-design framework for pairing-based cryptography. With a unified IR/ISA/hardware abstraction, a parameterized pipelined architecture, and an optimizing compiler, Finesse improves flexible accelerator throughput and competitiveness with specialized ASICs. The framework provides an agile path for generating high-performance cryptographic accelerators across pairing workloads.

Finesse paper accepted by ISCA 2025