Selected Publications
Conference Papers
-
High-Throughput, Cost-Effective Billion-Scale Vector Search with a Single GPU.
-
Cost-efficient Archive Cloud Storage with Tape: Design and Deployment.
Qing Wang,
Fan Yang,
Qiang Liu,
Geng Xiao,
Yongpeng Chen,
Hao Lan,
Leiming Chen,
Bangzhu Chen,
Chenrui Liu,
Pingchang Bai,
Bin Huang,
Zigan Luo,
Mingyu Xie,
Yu Wang,
Youyou Lu,
Huatao Wu,
Jiwu Shu
The 24th USENIX Conference on File and Storage Technologies (FAST'26), 2026
Paper
BibTeX
-
OdinANN: Direct Insert for Consistently Stable Performance in Billion-Scale Graph-Based Vector Search.
-
Weaver: Efficient Multi-LLM Serving with Attention Offloading.
-
GPreempt: GPU Preemptive Scheduling Made General and Efficient.
-
Stripeless Data Placement for Erasure-Coded In-Memory Storage.
-
Achieving Low-Latency Graph-Based Vector Search via Aligning Best-First Search Algorithm with SSD.
-
ShiftLock: Mitigate One-sided RDMA Lock Contention via Handover.
-
Frugal: Efficient and Economic Embedding Model Training with Commodity GPUs.
-
Medusa: Accelerating Serverless LLM Inference with Materialization.
-
Achieving Wire-Latency Storage Systems by Exploiting Hardware ACKs.
-
Fast State Restoration in LLM Serving with HCache.
-
Deft: A Scalable Tree Index for Disaggregated Memory.
-
Designing an Efficient Tree Index on Disaggregated Memory.
-
Accelerating Distributed Filesystem Metadata Service via Decoupling Directory Semantics from Metadata Indexing.
-
Fast Core Scheduling with Userspace Process Abstraction.
-
A Tale of Two Paths: Toward a Hybrid Data Plane for Efficient Far-Memory Applications.
Lei Chen,
Shi Liu,
Chenxi Wang,
Haoran Ma,
Yifan Qiao,
Zhe Wang,
Chenggang Wu,
Youyou Lu,
Xiaobing Feng,
Huimin Cui,
Shan Lu,
Harry Xu
18th USENIX Symposium on Operating Systems Design and Implementation (OSDI'24), 2024
Paper
BibTeX
-
Ares-Flash: Efficient Parallel Integer Arithmetic Operations Using NAND Flash Memory.
-
MaxEmbed: Maximizing SSD Bandwidth Utilization for Huge Embedding Models Serving.
-
Volley: Accelerating Write-Read Orders in Disaggregated Storage.
-
Exploring the Asynchrony of Slow Memory Filesystem with EasyIO.
-
TeRM: Extending RDMA-Attached Memory with SSD.
-
Revisiting Secondary Indexing in LSM-based Storage Systems with Persistent Memory.
-
SingularFS: A Billion-Scale Distributed File System Using a Single Metadata Server.
-
PetPS: Supporting Huge Embedding Models with Persistent Memory.
-
λ-IO: A Unified IO Stack for Computational Storage.
-
Citron: Distributed Range Lock Management with One-sided RDMA.
-
Patronus: High-Performance and Protective Remote Memory.
-
Mobius: Fine Tuning Large-scale Models on Commodity GPU Servers.
-
Replicating Persistent Memory Key-Value Stores with Efficient RDMA Abstraction.
-
RIO: Order-Preserving and CPU-Efficient Remote Storage Access.
-
SwitchTx: Scalable In-Network Coordination for Distributed Transaction Processing.
-
Pacman: An Efficient Compaction Approach for Log-Structured Key-Value Store on Persistent Memory.
-
AlNiCo: SmartNIC-accelerated Contention-aware Request Scheduling for Transaction Processing.
-
Fleche: An Efficient GPU Embedding Cache for Personalized Recommendations.
-
InfiniFS: An Efficient Metadata Service for Large-Scale Distributed Filesystems.
-
Plor: General Transactions with Predictable, Low Tail Latency.
-
Sherman: A Write-Optimized Distributed B+Tree Index on Disaggregated Memory.
-
Crash Consistent Non-Volatile Memory Express.
-
ParaBit: Processing Parallel Bitwise Operations in NAND Flash Memory based SSDs.
-
Max: A Multicore-Accelerated File System for Flash Storage.
-
Nap: A Black-Box Approach to NUMA-Aware Persistent Memory Indexes.
-
Aria: Tolerating Skewed Workloads in Secure In-memory Key-value Stores.
-
Scalable Persistent Memory File System with Kernel-Userspace Collaboration.
-
Concordia: Distributed Shared Memory with In-Network Cache Coherence.
-
Kraken: Memory Efficient Continual Learning for Large-Scale Real-Time Recommendations.
Minhui Xie,
Kai Ren,
Youyou Lu,
Guangxu Yang,
Qingxing Xu,
Bihai Wu,
Jiazhen Lin,
Hongbo Ao,
Wanhong Xu,
Jiwu Shu
Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC'20), 2020
Paper
Slides
Code
BibTeX
-
Write Dependency Disentanglement with HORAE.
-
μTree: a Persistent B+-Tree with Low Tail Latency.
-
Improving the Concurrency Performance of Persistent Memory Transactions on Multicores.
-
CoinPurse: A Device-Assisted File System with Dual Interfaces.
-
FlatStore: an Efficient Log-Structured Key-Value Storage Engine for Persistent Memory.
-
No Compromises: Secure NVM with Crash Consistency, Write-Efficiency and High-Performance.
-
ASCache: An Approximate SSD Cache for Error-Tolerant Applications.
-
Scalable RDMA RPC on Reliable Connection with Efficient Resource Sharing.
-
LerGAN: A Zero-Free, Low Data Movement and PIM-Based GAN Architecture.
Haiyu Mao,
Mingcong Song,
Tao Li,
Yuting Dai,
Jiwu Shu
51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'18), 2018
Paper
BibTeX
-
Locofs: A loosely-coupled metadata service for distributed file systems.
Siyang Li,
Youyou Lu,
Jiwu Shu,
Yang Hu,
Tao Li
Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC'17), 2017
Paper
BibTeX
-
Log-structured non-volatile main memory.
Qingda Hu,
Jinglei Ren,
Anirudh Badam,
Jiwu Shu,
Thomas Moscibroda
USENIX Annual Technical Conference (USENIX ATC'17), 2017
Paper
BibTeX
-
Octopus: an RDMA-enabled Distributed Persistent Memory File System.
-
A high performance file system for non-volatile main memory.
-
ParaFS: A log-structured file system to exploit the internal parallelism of flash devices.
-
Blurred persistence in transactional persistent memory.
-
Loose-ordering consistency for persistent memory.
-
ReconFS: A reconstructable file system on flash storage.
-
Aegis: Partitioning data block for efficient recovery of stuck-at-faults in phase change memory.
Jie Fan,
Song Jiang,
Jiwu Shu,
Youhui Zhang,
Weimin Zhen
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'13), 2013
Paper
BibTeX
-
Extending the lifetime of flash-based storage through reducing write amplification from file systems.
Journal Papers
- StageWise: Accelerating Persistent Key-Value Stores by Thread Model Redesigning.
- Efficiently Enlarging RDMA-Attached Memory with SSD.
- Perseid: A Secondary Indexing Mechanism for LSM-based Storage Systems.
- Building Write-Optimized Tree Indexes on Disaggregated Memory.
- TH-iSSD: Design and Implementation of a Generic and Reconfigurable Near-Data Processing Framework.
- Efficient Crash Consistency for NVMe over PCIe and RDMA.
- Nap: Persistent Memory Indexes for NUMA Architectures.
- Reprogramming 3D TLC Flash Memory based Solid State Drives.
- Octopus+: an RDMA-enabled Distributed Persistent Memory File System.
- LrGAN: A Compact and Energy Efficient PIM-based Architecture for GAN Training.
- TH-DPMS: Design and Implementation of an RDMA-enabled Distributed Persistent Memory Storage System.
- Towards Unaligned Writes Optimization in Cloud Storage with High-performance SSDs.
- ShieldNVM: An Efficient and Fast Recoverable System for Secure Non-Volatile Memory.
- Cross-Rack-Aware Single Failure Recovery for Clustered File Systems.
- Mitigating Synchronous I/O Overhead in File Systems on Open-Channel SSDs.
- Correlation-Aware Stripe Organization for Efficient Writes in Erasure-Coded Storage: Algorithms and Evaluation.
- A Flattened Metadata Service for Distributed File Systems.
- Efficient and Consistent NVMM Cache for SSD-based File System.
- HiNFS: A Persistent Memory File System with Both Buffering and Direct-Access.
- Encoding-Aware Data Placement for Efficient Degraded Reads in XOR-Coded Storage Systems: Algorithms and Evaluation.
- FlashKV: Accelerating KV performance with open-channel SSDs.
- Seek-efficient i/o optimization in single failure recovery for xor-coded storage systems.
- Short code: An efficient RAID-6 MDS code for optimizing degraded reads and partial stripe writes.
- Parity-switched data placement: Optimizing partial stripe writes in xor-coded storage systems.
- Hv code: An all-around mds code for raid-6 storage systems.
- Reconsidering single disk failure recovery for erasure coded storage systems: Optimizing load balancing in stack-level.
- Blurred persistence: Efficient transactions in persistent memory.
- Supporting system consistency with differential transactions in flash-based SSDs.
- Caco: An efficient cauchy coding approach for cloud storage systems.
- High-performance and lightweight transaction support in flash-based SSDs.
- Redistribute Data to Regain Load Balance during RAID-4 Scaling.
- Design and implementation of an asymmetric block-based parallel file system.
- Generalized X-code: An efficient RAID-6 code for arbitrary size of disk array.
- Preventing Silent Data Corruptions from Propagating During Data Reconstruction.
- SOPA: Selecting the Optimal Policy Adaptively for a cache system.
- DACO: A High Performance Disk Architecture Designed Specially for Large Scale Erasure Coded Storage Systems.
- ALV: A New Data Redistribution Approach to RAID-5 Scaling.
- GRID Codes: Strip-based Erasure Codes with High Fault Tolerance for Storage Systems.
- SLAS: An Efficient Approach to Scaling Round-robin Striped Volumes.
- Design and Implementation of an Out-of-Band Virtualization System for Large SANs.
- Design and Implementation of a SAN System Based on the Fiber Channel Protocol.
- A Parallel Transient Stability Simulation for Power System.
Full Publications