CI-Lab is a Research Group affiliated with the Institute of Advanced Computing Technology (ACT), School of Computer Science and Engineering (SCSE), Beihang University (BUAA).
👏 Paper title: MIREDO: MIP-Driven Resource-Efficient Dataflow Optimization for Computing-in-Memory Accelerator. We propose MIREDO, a framework that formulates dataflow optimization for Computing-in-Memory (CIM) accelerators as a Mixed-Integer Programming problem. By jointly modeling workload characteristics, dataflow strategies, and CIM-specific constraints with an analytical latency model, MIREDO navigates the vast design space to find optimal configurations, achieving up to 3.2× performance improvement across various DNN models. [related project]
👏 Paper title: SlimInfer: Accelerating Long-Context LLM Inference via Dynamic Token Pruning. We identify an information diffusion phenomenon in LLMs, where critical token information spreads across the sequence, enabling aggressive pruning in later layers. Based on this, we propose SlimInfer, which performs dynamic block-wise pruning with a predictor-free asynchronous KV cache manager, achieving up to 2.53× TTFT speedup and 1.88× latency reduction on LLaMA-3.1-8B-Instruct. [related project]
👏 Paper title: CIMinus: Empowering Sparse DNN Workloads Modeling and Exploration on SRAM-based CIM Architectures. We propose CIMinus, a cost modeling framework for efficient design space exploration of sparse DNN workloads on SRAM-based compute-in-memory architectures. It introduces FlexBlock, an expressive sparsity abstraction, and provides an integrated workflow from model pruning to system-level evaluation, accurately estimating speedups and energy savings within 5.27% error. [related project]
👏 Paper title: TinyFormer: Efficient Sparse Transformer Design and Deployment on Tiny Devices. We propose TinyFormer, a framework for developing and deploying resource-efficient transformer models on Microcontrollers (MCUs). Integrating architecture search, sparse model optimization, and automated deployment, it achieves 96.1% accuracy on CIFAR-10 under strict hardware constraints, delivering up to 12.2× inference speedup compared to CMSIS-NN. [related project]
👏 Paper title: Towards Affordable, Adaptive and Automatic GNN Training on CPU-GPU Heterogeneous Platforms. In this paper, we introduces A3GNN, a framework for Affordable, Adaptive, and Automatic GNN training on heterogeneous CPU-GPU platforms. It improves resource usage through locality-aware sampling and fine-grained parallelism scheduling. Moreover, it leverages reinforcement learning to explore the design space and achieve pareto-optimal trade-offs among throughput, memory footprint, and accuracy. [related project]
👏 Paper title: ACE-GNN: Adaptive GNN Co-Inference with System-Aware Scheduling in Dynamic Edge Environments. We present ACE-GNN, the first adaptive GNN co-inference framework for dynamic edge environments. It enables rapid runtime scheme optimization and adaptive scheduling between pipeline and data parallelism, coupled with efficient batching and communication middleware, achieving up to 12.7× speedup and 82.3% energy savings. [related project]
👏Paper title: CIMFlow: An Integrated Framework for Systematic Design and Evaluation of Digital CIM Architectures
We introduce CIMFlow, an integrated framework that bridges compilation and simulation with a flexible ISA for digital Compute-in-Memory architectures. It addresses SRAM capacity limits through advanced partitioning and parallelism, achieving up to 2.8× speedup and 61.7% energy reduction across diverse deep learning workloads.
[related project]