Selected Publications
Conference Papers
-
Frugal: Efficient and Economic Embedding Model Training with Commodity GPUs.
-
Medusa: Accelerating Serverless LLM Inference with Materialization.
-
Achieving Wire-Latency Storage Systems by Exploiting Hardware ACKs.
-
Fast State Restoration in LLM Serving with HCache.
-
Deft: A Scalable Tree Index for Disaggregated Memory.
-
Fast Core Scheduling with Userspace Process Abstraction.
-
Ares-Flash: Efficient Parallel Integer Arithmetic Operations Using NAND Flash Memory.
-
MaxEmbed: Maximizing SSD Bandwidth Utilization for Huge Embedding Models Serving.
-
Volley: Accelerating Write-Read Orders in Disaggregated Storage.
-
Exploring the Asynchrony of Slow Memory Filesystem with EasyIO.
-
TeRM: Extending RDMA-Attached Memory with SSD.
-
Revisiting Secondary Indexing in LSM-based Storage Systems with Persistent Memory.
-
SingularFS: A Billion-Scale Distributed File System Using a Single Metadata Server.
-
PetPS: Supporting Huge Embedding Models with Persistent Memory.
-
λ-IO: A Unified IO Stack for Computational Storage.
-
Citron: Distributed Range Lock Management with One-sided RDMA.
-
Patronus: High-Performance and Protective Remote Memory.
-
Mobius: Fine Tuning Large-scale Models on Commodity GPU Servers.
-
Replicating Persistent Memory Key-Value Stores with Efficient RDMA Abstraction.
-
RIO: Order-Preserving and CPU-Efficient Remote Storage Access.
-
SwitchTx: Scalable In-Network Coordination for Distributed Transaction Processing.
-
Pacman: An Efficient Compaction Approach for Log-Structured Key-Value Store on Persistent Memory.
-
AlNiCo: SmartNIC-accelerated Contention-aware Request Scheduling for Transaction Processing.
-
Fleche: An Efficient GPU Embedding Cache for Personalized Recommendations.
-
InfiniFS: An Efficient Metadata Service for Large-Scale Distributed Filesystems.
-
Plor: General Transactions with Predictable, Low Tail Latency.
-
Sherman: A Write-Optimized Distributed B+Tree Index on Disaggregated Memory.
-
Crash Consistent Non-Volatile Memory Express.
-
ParaBit: Processing Parallel Bitwise Operations in NAND Flash Memory based SSDs.
-
Max: A Multicore-Accelerated File System for Flash Storage.
-
Nap: A Black-Box Approach to NUMA-Aware Persistent Memory Indexes.
-
Aria: Tolerating Skewed Workloads in Secure In-memory Key-value Stores.
-
Scalable Persistent Memory File System with Kernel-Userspace Collaboration.
-
Concordia: Distributed Shared Memory with In-Network Cache Coherence.
-
Kraken: Memory Efficient Continual Learning for Large-Scale Real-Time Recommendations.
Minhui Xie,
Kai Ren,
Youyou Lu,
Guangxu Yang,
Qingxing Xu,
Bihai Wu,
Jiazhen Lin,
Hongbo Ao,
Wanhong Xu,
Jiwu Shu
Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC'20), 2020
Paper
Slides
Code
-
Write Dependency Disentanglement with HORAE.
-
μTree: a Persistent B+-Tree with Low Tail Latency.
-
Improving the Concurrency Performance of Persistent Memory Transactions on Multicores.
-
CoinPurse: A Device-Assisted File System with Dual Interfaces.
-
FlatStore: an Efficient Log-Structured Key-Value Storage Engine for Persistent Memory.
-
No Compromises: Secure NVM with Crash Consistency, Write-Efficiency and High-Performance.
-
ASCache: An Approximate SSD Cache for Error-Tolerant Applications.
-
Scalable RDMA RPC on Reliable Connection with Efficient Resource Sharing.
-
LerGAN: A Zero-Free, Low Data Movement and PIM-Based GAN Architecture.
Haiyu Mao,
Mingcong Song,
Tao Li,
Yuting Dai,
Jiwu Shu
51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'18), 2018
Paper
-
Locofs: A loosely-coupled metadata service for distributed file systems.
Siyang Li,
Youyou Lu,
Jiwu Shu,
Yang Hu,
Tao Li
Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC'17), 2017
Paper
-
Log-structured non-volatile main memory.
Qingda Hu,
Jinglei Ren,
Anirudh Badam,
Jiwu Shu,
Thomas Moscibroda
USENIX Annual Technical Conference (USENIX ATC'17), 2017
Paper
-
Octopus: an RDMA-enabled Distributed Persistent Memory File System.
-
A high performance file system for non-volatile main memory.
Jiaxin Ou,
Jiwu Shu,
Youyou Lu
Proceedings of the Eleventh European Conference on Computer Systems (EuroSys'16), 2016
Paper
-
ParaFS: A log-structured file system to exploit the internal parallelism of flash devices.
-
Blurred persistence in transactional persistent memory.
-
Loose-ordering consistency for persistent memory.
Youyou Lu,
Jiwu Shu,
Long Sun,
Onur Mutlu
IEEE 32nd International Conference on Computer Design (ICCD'14), 2014
Paper
-
ReconFS: A reconstructable file system on flash storage.
Youyou Lu,
Jiwu Shu,
Wei Wang
Proceedings of the 12th USENIX Conference on File and Storage Technologies (FAST'14), 2014
Paper
-
Aegis: Partitioning data block for efficient recovery of stuck-at-faults in phase change memory.
Jie Fan,
Song Jiang,
Jiwu Shu,
Youhui Zhang,
Weimin Zhen
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'13), 2013
Paper
-
Extending the lifetime of flash-based storage through reducing write amplification from file systems.
Youyou Lu,
Jiwu Shu,
Weimin Zheng
Proceedings of the 12th USENIX Conference on File and Storage Technologies (FAST'13), 2013
Paper
Journal Papers
- Efficiently Enlarging RDMA-Attached Memory with SSD.
- Perseid: A Secondary Indexing Mechanism for LSM-based Storage Systems.
- Building Write-Optimized Tree Indexes on Disaggregated Memory.
- TH-iSSD: Design and Implementation of a Generic and Reconfigurable Near-Data Processing Framework.
- Efficient Crash Consistency for NVMe over PCIe and RDMA.
- Nap: Persistent Memory Indexes for NUMA Architectures.
- Reprogramming 3D TLC Flash Memory based Solid State Drives.
Congming Gao,
Min Ye,
Chun Jason Xue,
Youtao Zhang,
Liang Shi,
Jiwu Shu,
Jun Yang
ACM Transactions on Storage
(TOS),
2022
Paper
- Octopus+: an RDMA-enabled Distributed Persistent Memory File System.
- LrGAN: A Compact and Energy Efficient PIM-based Architecture for GAN Training.
- TH-DPMS: Design and Implementation of an RDMA-enabled Distributed Persistent Memory Storage System.
- Towards Unaligned Writes Optimization in Cloud Storage with High-performance SSDs.
Jiwu Shu,
Fei Li,
Siyang Li,
Youyou Lu
IEEE Transactions on Parallel and Distributed Systems
(TPDS),
2020
Paper
- ShieldNVM: An Efficient and Fast Recoverable System for Secure Non-Volatile Memory.
- Cross-Rack-Aware Single Failure Recovery for Clustered File Systems.
- Mitigating Synchronous I/O Overhead in File Systems on Open-Channel SSDs.
- Correlation-Aware Stripe Organization for Efficient Writes in Erasure-Coded Storage: Algorithms and Evaluation.
Zhirong Shen,
Patrick PC Lee,
Jiwu Shu,
Wenzhong Guo
IEEE Transactions on Parallel and Distributed Systems
(TPDS),
2019
Paper
- A Flattened Metadata Service for Distributed File Systems.
Siyang Li,
Fenlin Liu,
Jiwu Shu,
Youyou Lu,
Tao Li,
Yang Hu
IEEE Transactions on Parallel and Distributed Systems
(TPDS),
2018
Paper
- Efficient and Consistent NVMM Cache for SSD-based File System.
- HiNFS: A Persistent Memory File System with Both Buffering and Direct-Access.
- Encoding-Aware Data Placement for Efficient Degraded Reads in XOR-Coded Storage Systems: Algorithms and Evaluation.
Zhirong Shen,
Patrick PC Lee,
Jiwu Shu,
Wenzhong Guo
IEEE Transactions on Parallel and Distributed Systems
(TPDS),
2018
Paper
- FlashKV: Accelerating KV performance with open-channel SSDs.
Jiacheng Zhang,
Youyou Lu,
Jiwu Shu,
Xiongjun Qin
ACM Transactions on Embedded Computing Systems
(TECS),
2017
Paper
- Seek-efficient i/o optimization in single failure recovery for xor-coded storage systems.
- Short code: An efficient RAID-6 MDS code for optimizing degraded reads and partial stripe writes.
- Parity-switched data placement: Optimizing partial stripe writes in xor-coded storage systems.
- Hv code: An all-around mds code for raid-6 storage systems.
- Reconsidering single disk failure recovery for erasure coded storage systems: Optimizing load balancing in stack-level.
- Blurred persistence: Efficient transactions in persistent memory.
- Supporting system consistency with differential transactions in flash-based SSDs.
- Caco: An efficient cauchy coding approach for cloud storage systems.
- High-performance and lightweight transaction support in flash-based SSDs.
- Redistribute Data to Regain Load Balance during RAID-4 Scaling.
- Design and implementation of an asymmetric block-based parallel file system.
Letian Yi,
Jiwu Shu,
Ying Zhao,
Yinjin Qing,
Youyou Lu,
Weiming Zheng
IEEE Transactions on Computers
(TC),
2014
Paper
- Generalized X-code: An efficient RAID-6 code for arbitrary size of disk array.
Xianghong Luo,
Jiwu Shu
ACM Transactions on Storage
(TOS),
2012
Paper
- Preventing Silent Data Corruptions from Propagating During Data Reconstruction.
Mingqiang Li,
Jiwu Shu
IEEE Transactions on Computers
(TC),
2010
Paper
- SOPA: Selecting the Optimal Policy Adaptively for a cache system.
- DACO: A High Performance Disk Architecture Designed Specially for Large Scale Erasure Coded Storage Systems.
Mingqiang Li,
Jiwu Shu
IEEE Transactions on Computers
(TC),
2010
Paper
- ALV: A New Data Redistribution Approach to RAID-5 Scaling.
- GRID Codes: Strip-based Erasure Codes with High Fault Tolerance for Storage Systems.
Mingqiang Li,
Jiwu Shu,
Weimin Zheng
ACM Transactions on Storage
(TOS),
2009
Paper
- SLAS: An Efficient Approach to Scaling Round-robin Striped Volumes.
- Design and Implementation of an Out-of-Band Virtualization System for Large SANs.
- Design and Implementation of a SAN System Based on the Fiber Channel Protocol.
Jiwu Shu,
Bigang Li,
Weimin Zheng
IEEE Transactions on Computers
(TC),
2005
Paper
- A Parallel Transient Stability Simulation for Power System.
Jiwu Shu,
Wei Xue,
Weimin Zheng
IEEE Transactions on Power Systems
(TOPS),
2005
Paper
Full Publications