CI-Lab

Focus-dLLM paper accepted by ACL 2026

Apr 7, 2026

Paper title: Focus-dLLM: Accelerating Long-Context Diffusion LLM Inference via Confidence-Guided Context Focusing. We propose Focus-dLLM for long-context diffusion LLM inference. It uses confidence-guided context focusing and sink-aware pruning to reduce redundant bidirectional attention without retraining.

GitHub: Longxmas/Focus-dLLM

GCoDE paper accepted by IEEE TC

Jan 1, 2026

Paper title: GCoDE: Efficient Device-Edge Co-Inference for GNNs via Architecture-Mapping Co-Search. We propose GCoDE, an architecture-mapping co-search framework for device-edge GNN co-inference. It jointly searches neural architectures and deployment mappings to balance accuracy, computation, and communication.

MIREDO paper accepted by ASP-DAC 2026

Jan 1, 2026

Paper title: MIREDO: MIP-Driven Resource-Efficient Dataflow Optimization for Computing-in-Memory Accelerator. We propose MIREDO for resource-efficient CIM dataflow optimization. It formulates dataflow mapping as a mixed-integer programming problem, jointly modeling workloads, hardware constraints, and latency.

SlimInfer paper accepted by AAAI 2026

Jan 1, 2026

Paper title: SlimInfer: Accelerating Long-Context LLM Inference via Dynamic Token Pruning. We propose SlimInfer for long-context LLM inference. It uses dynamic block-wise token pruning and a predictor-free asynchronous KV cache manager to reduce prefill latency, memory pressure, and I/O overhead.

GitHub: Longxmas/SlimInfer

CIMinus paper accepted by IEEE TC

Dec 31, 2025

Paper title: CIMinus: Empowering Sparse DNN Workloads Modeling and Exploration on SRAM-based CIM Architectures. We propose CIMinus, a modeling framework for sparse DNN workloads on SRAM-based CIM. It estimates latency and energy under different sparsity patterns and mappings, supporting more accurate design-space exploration.

TinyFormer paper accepted by IEEE TCAS-I

Dec 31, 2025

Paper title: TinyFormer: Efficient Sparse Transformer Design and Deployment on Tiny Devices. We propose TinyFormer for sparse transformer design and deployment on tiny devices. It combines architecture search, sparse model optimization, and deployment support to make transformer inference feasible under strict MCU resource budgets.

TinyFormer paper accepted by IEEE TCAS-I

A3GNN paper accepted by ICCD 2025

Nov 10, 2025

Paper title: Towards Affordable, Adaptive and Automatic GNN Training on CPU-GPU Heterogeneous Platforms. We introduce A3GNN for adaptive GNN training on CPU-GPU platforms. It combines locality-aware sampling and fine-grained scheduling to balance throughput, memory footprint, and training quality on heterogeneous systems.

GitHub: BUAA-CI-LAB/A3GNN

ACE-GNN paper accepted by IEEE TCAD

Sep 28, 2025

Paper title: ACE-GNN: Adaptive GNN Co-Inference with System-Aware Scheduling in Dynamic Edge Environments. We present ACE-GNN for adaptive GNN co-inference in dynamic edge environments. It performs runtime scheme optimization and switches between pipeline and data parallelism to handle changing bandwidth, load, and access patterns.

CIMFlow paper accepted by DAC 2025

Jun 23, 2025

Paper title: CIMFlow: An Integrated Framework for Systematic Design and Evaluation of Digital CIM Architectures. We introduce CIMFlow, an integrated framework for digital compute-in-memory architectures. It connects ISA design, MLIR-based compilation, and SystemC simulation so researchers can prototype and evaluate CIM systems more systematically.

Finesse paper accepted by ISCA 2025

Jun 20, 2025

Paper title: Finesse: An Agile Design Framework for Pairing-based Cryptography via Software/Hardware Co-Design. We introduce Finesse, a software-hardware co-design framework for pairing-based cryptography. It integrates compiler support, simulation, and parameterized pipelined hardware to speed up accelerator design and execution.

GitHub: BUAA-CI-LAB/Finesse

Sparse SRAM-PIM paper accepted by IEEE TCAD

Jun 16, 2025

Paper title: Efficient SRAM-PIM Co-Design by Joint Exploration of Value-Level and Bit-Level Sparsity. We jointly exploit value-level and bit-level sparsity for SRAM-PIM co-design. The framework skips both structured zero values and redundant zero bits, reducing unnecessary digital SRAM-PIM computation.

Sparse SRAM-PIM paper accepted by IEEE TCAD

GNN Co-Inference paper accepted by DAC 2024

Jun 23, 2024

Paper title: Graph Neural Networks Automated Design and Deployment on Device-Edge Co-Inference Systems. We study automated GNN design and deployment for device-edge co-inference. The framework jointly explores neural architectures and partitioning schemes so that computation and communication costs are balanced together.

GNN Co-Inference paper accepted by DAC 2024

DB-PIM paper accepted by DAC 2024

Jun 23, 2024

Paper title: Towards Efficient SRAM-PIM Architecture Design by Exploiting Unstructured Bit-Level Sparsity. We propose DB-PIM for efficient SRAM-PIM design. The framework exploits unstructured bit-level sparsity with sparsity-preserving algorithms and specialized computation units to reduce redundant PIM operations.

GNNavigator paper accepted by DAC 2024

Jun 23, 2024

Paper title: GNNavigator: Towards Adaptive Training of Graph Neural Networks via Automatic Guideline Exploration. We introduce GNNavigator for adaptive GNN training optimization. It uses software-hardware co-abstraction and performance modeling to explore training guidelines that balance runtime, memory use, and accuracy.

DDC-PIM paper accepted by IEEE TCAD

Nov 7, 2023

Paper title: DDC-PIM: Efficient Algorithm/Architecture Co-Design for Doubling Data Capacity of SRAM-Based Processing-in-Memory. We propose DDC-PIM for SRAM-based PIM. By exploiting the cross-coupled structure of 6T SRAM cells, DDC-PIM increases effective data capacity and improves the practicality of in-memory DNN acceleration.

GNN aggregation paper accepted by IEEE CAL

Nov 1, 2023

Paper title: Architectural Implications of GNN Aggregation Programming Abstractions. We evaluate how GNN aggregation programming abstractions affect architecture behavior. The work builds a taxonomy around data organization and propagation patterns, helping explain how abstraction choices shape performance across platforms.

GNN aggregation paper accepted by IEEE CAL

L2 compression paper accepted by ICCV 2023

Oct 2, 2023

Paper title: Lossy and Lossless (L2) Post-training Model Size Compression. We propose an L2 post-training model compression framework. It jointly combines lossy transformation and lossless coding to reduce model size while preserving accuracy and supporting controllable compression targets.

L2 compression paper accepted by ICCV 2023

Hardware-aware GNN NAS paper accepted by DAC 2023

Jul 9, 2023

Paper title: Hardware-Aware Graph Neural Network Automated Design for Edge Computing Platforms. We explore hardware-aware neural architecture search for GNNs on edge platforms. The method considers device heterogeneity and deployment cost during model search, improving the balance between accuracy and practical execution efficiency.

Hardware-aware GNN NAS paper accepted by DAC 2023

In-cache MPUF paper accepted by IEEE TCAS-I

Apr 1, 2022

Paper title: Reconfigurable and Dynamically Transformable In-Cache-MPUF System With True Randomness Based on the SOT-MRAM. We present a reconfigurable in-cache PUF system based on SOT-MRAM. The design uses device randomness and transformable memory behavior to build flexible security primitives inside cache structures.

In-cache MPUF paper accepted by IEEE TCAS-I

NAND-SPIN PIM paper accepted by SCIS

Apr 1, 2022

Paper title: NAND-SPIN-Based Processing-in-MRAM Architecture for Convolutional Neural Network Acceleration. We propose a NAND-SPIN-based processing-in-MRAM architecture for CNN acceleration. The design exploits in-memory logic and data-local execution to reduce movement between memory and compute units.

Graph CC PIM paper accepted by IEEE TCAD

Mar 1, 2022

Paper title: Accelerating Graph Connected Component Computation with Emerging Processing-In-Memory Architecture. We propose an algorithm-architecture co-design for graph connected components with PIM. The work adapts graph traversal and data organization to memory-local execution, reducing traffic in irregular graph analytics workloads.

Graph CC PIM paper accepted by IEEE TCAD

Eventor paper accepted by DAC 2022

Feb 1, 2022

Paper title: Eventor: An Efficient Event-Based Monocular Multi-View Stereo Accelerator on FPGA Platform. We propose Eventor, an FPGA accelerator for event-based monocular multi-view stereo. It accelerates event back-projection and volumetric ray-counting, improving throughput and energy efficiency for event-camera-based 3D vision.

Triangle counting paper accepted by IEEE TC

Nov 1, 2021

Paper title: Triangle Counting Accelerations: From Algorithm to In-Memory Computing Architecture. We propose a processing-in-memory approach for triangle counting, a core graph analytics primitive. The work reduces expensive graph data movement by rethinking both the algorithm and the architecture around memory-local computation.

Triangle counting paper accepted by IEEE TC

FedSkel paper accepted by CIKM 2021

Oct 1, 2021

Paper title: FedSkel: Efficient Federated Learning on Heterogeneous Systems with Skeleton Gradients Update. We propose FedSkel, an efficient federated learning framework for heterogeneous edge systems. FedSkel identifies compact skeleton gradients and updates only the most essential model components, reducing computation and communication overhead while keeping federated training effective on resource-constrained devices.

S2Engine paper accepted by IEEE TC

Jun 1, 2021

Paper title: S2Engine: A Novel Systolic Architecture for Sparse Convolutional Neural Networks. We propose S2Engine, a systolic architecture for accelerating sparse CNNs. It coordinates sparse computation, data reuse, and scheduling so that sparsity can improve efficiency without destroying the regular execution style of systolic arrays.

GNN memory optimization paper accepted by RTAS 2021

May 1, 2021

Paper title: Optimizing Memory Efficiency of Graph Neural Networks on Edge Computing Platforms. We propose a feature decomposition method for memory-efficient GNN inference on edge platforms. The method lowers peak memory pressure by splitting feature processing into manageable pieces, helping constrained devices execute graph workloads more reliably.

GNN memory optimization paper accepted by RTAS 2021