Full Publication Lists
2025
- Frugal: Efficient and Economic Embedding Model Training with Commodity GPUs.
The 30th Conference on Architectural Support for Programming Languages and Operating Systems
(ASPLOS'25),
2025
Paper
- Medusa: Accelerating Serverless LLM Inference with Materialization.
The 30th Conference on Architectural Support for Programming Languages and Operating Systems
(ASPLOS'25),
2025
Paper
- Achieving Wire-Latency Storage Systems by Exploiting Hardware ACKs.
The 22nd USENIX Symposium on Networked Systems Design and Implementation
(NSDI'25),
2025
Paper
- Fast State Restoration in LLM Serving with HCache.
The 20th European Conference on Computer Systems
(EuroSys'25),
2025
Paper
- Deft: A Scalable Tree Index for Disaggregated Memory.
The 20th European Conference on Computer Systems
(EuroSys'25),
2025
Paper
2024
- Efficiently Enlarging RDMA-Attached Memory with SSD.
ACM Transactions on Storage
(TOS),
2024
Paper
- Fast Core Scheduling with Userspace Process Abstraction.
The 30th ACM Symposium on Operating Systems Principles
(SOSP'24),
2024
Paper
- Ares-Flash: Efficient Parallel Integer Arithmetic Operations Using NAND Flash Memory.
57th Annual IEEE/ACM International Symposium on Microarchitecture
(MICRO-57),
2024
Paper
- MaxEmbed: Maximizing SSD Bandwidth Utilization for Huge Embedding Models Serving.
The 29th Conference on Architectural Support for Programming Languages and Operating Systems
(ASPLOS'24),
2024
Paper
- Volley: Accelerating Write-Read Orders in Disaggregated Storage.
The 19th European Conference on Computer Systems
(EuroSys'24),
2024
Paper
- Exploring the Asynchrony of Slow Memory Filesystem with EasyIO.
The 19th European Conference on Computer Systems
(EuroSys'24),
2024
Paper
- TeRM: Extending RDMA-Attached Memory with SSD.
The 22nd USENIX Conference on File and Storage Technologies
(FAST'24),
2024
Paper
Code
- Perseid: A Secondary Indexing Mechanism for LSM-based Storage Systems.
ACM Transactions on Storage
(TOS),
2024
Paper
2023
- Revisiting Secondary Indexing in LSM-based Storage Systems with Persistent Memory.
USENIX Annual Technical Conference
(USENIX ATC'23),
2023
Paper
Code
- SingularFS: A Billion-Scale Distributed File System Using a Single Metadata Server.
USENIX Annual Technical Conference
(USENIX ATC'23),
2023
Paper
- PetPS: Supporting Huge Embedding Models with Persistent Memory.
The 49th International Conference on Very Large Data Bases
(VLDB'23),
2023
Paper
Slides
Code
- λ-IO: A Unified IO Stack for Computational Storage.
The 21st USENIX Conference on File and Storage Technologies
(FAST'23),
2023
Paper
Code
- Citron: Distributed Range Lock Management with One-sided RDMA.
The 21st USENIX Conference on File and Storage Technologies
(FAST'23),
2023
Paper
- Patronus: High-Performance and Protective Remote Memory.
The 21st USENIX Conference on File and Storage Technologies
(FAST'23),
2023
Paper
Slides
Code
- Mobius: Fine Tuning Large-scale Models on Commodity GPU Servers.
The 28th Conference on Architectural Support for Programming Languages and Operating Systems
(ASPLOS'23),
2023
Paper
Slides
- Replicating Persistent Memory Key-Value Stores with Efficient RDMA Abstraction.
The 17th USENIX Symposium on Operating Systems Design and Implementation
(OSDI'23),
2023
Paper
- RIO: Order-Preserving and CPU-Efficient Remote Storage Access.
The 18th European Conference on Computer Systems
(EuroSys'23),
2023
Paper
Slides
- Building Write-Optimized Tree Indexes on Disaggregated Memory.
SIGMOD Record, March 2023 (Vol. 52, No. 1)
(SIGMOD Record),
2023
Paper
- TH-iSSD: Design and Implementation of a Generic and Reconfigurable Near-Data Processing Framework.
Transactions on Embedded Computing Systems
(ACM TECS),
2023
Paper
- Efficient Crash Consistency for NVMe over PCIe and RDMA.
ACM Transactions on Storage
(TOS),
2023
Paper
- NICFS: a file system based on persistent memory and SmartNIC.
Frontiers of Information Technology & Electronic Engineering
(FITEE'23),
2023
Paper
2022
- SwitchTx: Scalable In-Network Coordination for Distributed Transaction Processing.
Proceedings of The 48th International Conference on Very Large Data Bases
(VLDB'22),
2022
Paper
Slides
- Pacman: An Efficient Compaction Approach for Log-Structured Key-Value Store on Persistent Memory.
USENIX Annual Technical Conference
(USENIX ATC'22),
2022
Paper
Slides
Code
- AlNiCo: SmartNIC-accelerated Contention-aware Request Scheduling for Transaction Processing.
USENIX Annual Technical Conference
(USENIX ATC'22),
2022
Paper
- Fleche: An Efficient GPU Embedding Cache for Personalized Recommendations.
The 17th European Conference on Computer Systems
(EuroSys'22),
2022
Paper
Slides
- InfiniFS: An Efficient Metadata Service for Large-Scale Distributed Filesystems.
USENIX Conference on File and Storage Technologies
(FAST'22),
2022
Paper
- Plor: General Transactions with Predictable, Low Tail Latency.
ACM SIGMOD International Conference on Management of Data
(SIGMOD'22),
2022
Paper
- Sherman: A Write-Optimized Distributed B+Tree Index on Disaggregated Memory.
ACM SIGMOD International Conference on Management of Data
(SIGMOD'22),
2022
Paper
Slides
Code
- Nap: Persistent Memory Indexes for NUMA Architectures.
ACM Transactions on Storage
(TOS),
2022
Paper
- Reprogramming 3D TLC Flash Memory based Solid State Drives.
ACM Transactions on Storage
(TOS),
2022
Paper
- Efficient Atomic Durability on eADR-enabled Persistent Memory.
The 31st International Conference on Parallel Architectures and Compilation Techniques
(PACT'22),
2022
Paper
2021
- Crash Consistent Non-Volatile Memory Express.
The 28th ACM Symposium on Operating Systems Principles
(SOSP'21),
2021
Paper
Slides
Code
- ParaBit: Processing Parallel Bitwise Operations in NAND Flash Memory based SSDs.
54st Annual IEEE/ACM International Symposium on Microarchitecture
(MICRO'21),
2021
Paper
- Max: A Multicore-Accelerated File System for Flash Storage.
USENIX Annual Technical Conference
(USENIX ATC'21),
2021
Paper
Slides
Code
- Nap: A Black-Box Approach to NUMA-Aware Persistent Memory Indexes.
The 15th USENIX Symposium on Operating Systems Design and Implementation
(OSDI'21),
2021
Paper
Code
- Aria: Tolerating Skewed Workloads in Secure In-memory Key-value Stores.
37th IEEE International Conference on Data Engineering
(ICDE'21),
2021
Paper
- Scalable Persistent Memory File System with Kernel-Userspace Collaboration.
USENIX Conference on File and Storage Technologies
(FAST'21),
2021
Paper
- Concordia: Distributed Shared Memory with In-Network Cache Coherence.
USENIX Conference on File and Storage Technologies
(FAST'21),
2021
Paper
- Octopus+: an RDMA-enabled Distributed Persistent Memory File System.
ACM Transactions on Storage
(TOS),
2021
Paper
- LrGAN: A Compact and Energy Efficient PIM-based Architecture for GAN Training.
IEEE Transactions on Computers
(TC),
2021
Paper
- Pattern-Guided File Compression with User-Experience Enhancement for Log-Structured File System on Mobile Devices.
Cheng Ji,
Li-Pin Chang,
Riwei Ran,
Chao Wu,
Congming Gao,
Liang Shi,
Tei-Wei Kuo,
Chun Jason Xue,
USENIX Conference on File and Storage Technologies
(FAST'21),
2021
Paper
2020
- Kraken: Memory Efficient Continual Learning for Large-Scale Real-Time Recommendations.
Minhui Xie,
Kai Ren,
Youyou Lu,
Guangxu Yang,
Qingxing Xu,
Bihai Wu,
Jiazhen Lin,
Hongbo Ao,
Wanhong Xu,
Jiwu Shu,
Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
(SC'20),
2020
Paper
Slides
Code
- Write Dependency Disentanglement with HORAE.
The 14th USENIX Symposium on Operating Systems Design and Implementation
(OSDI'20),
2020
Paper
Slides
- μTree: a Persistent B+-Tree with Low Tail Latency.
46th International Conference on Very Large Data Bases
(VLDB'20),
2020
Paper
Slides
- Improving the Concurrency Performance of Persistent Memory Transactions on Multicores.
Design Automation Conference
(DAC'20),
2020
Paper
- CoinPurse: A Device-Assisted File System with Dual Interfaces.
Design Automation Conference
(DAC'20),
2020
Paper
- FlatStore: an Efficient Log-Structured Key-Value Storage Engine for Persistent Memory.
Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems
(ASPLOS'20),
2020
Paper
Slides
- TH-DPMS: Design and Implementation of an RDMA-enabled Distributed Persistent Memory Storage System.
ACM Transactions on Storage
(TOS),
2020
Paper
- Towards Unaligned Writes Optimization in Cloud Storage with High-performance SSDs.
IEEE Transactions on Parallel and Distributed Systems
(TPDS),
2020
Paper
- ShieldNVM: An Efficient and Fast Recoverable System for Secure Non-Volatile Memory.
ACM Transactions on Storage
(TOS),
2020
Paper
- Cross-Rack-Aware Single Failure Recovery for Clustered File Systems.
IEEE Transactions on Dependable and Secure Computing
(TDSC),
2020
Paper
- OCVM: Optimizing the Isolation of Virtual Machines with Open-Channel SSDs.
International Conference on Algorithms and Architectures for Parallel Processing
(ICA3PP'20),
2020
Paper
- Understanding and analysis of B+ trees on NVM towards consistency and efficiency.
CCF Transactions on High Performance Computing
(CCF-THPC),
2020
Paper
- SineKV: Decoupled Secondary Indexing for LSM-based Key-Value Stores.
40th IEEE International Conference on Distributed Computing Systems
(ICDCS'20),
2020
Paper
- NovKV: Efficient Garbage Collection for Key-Value Separated LSM-Stores.
36th International Conference on Massive Storage Systems and Technology
(MSST'20),
2020
Paper
- DIESEL: A Dataset-Based Distributed Storage and Caching System for Large-Scale Deep Learning Training.
Lipeng Wang,
Songgao Ye,
Baichen Yang,
Youyou Lu,
Hequan Zhang,
Shengen Yan,
Qiong Luo,
49th International Conference on Parallel Processing
(ICPP'20),
2020
Paper
2019
- OCStore: Accelerating Distributed Object Storage with Open-Channel SSDs.
The 39th IEEE International Conference on Distributed Computing Systems
(ICDCS'19),
2019
Paper
- Cognitive SSD: A Deep Learning Engine for Energy-Efficient Data Retrieval.
USENIX Annual Technical Conference
(USENIX ATC'19),
2019
Paper
- No Compromises: Secure NVM with Crash Consistency, Write-Efficiency and High-Performance.
Design Automation Conference
(DAC'19),
2019
Paper
- ASCache: An Approximate SSD Cache for Error-Tolerant Applications.
Design Automation Conference
(DAC'19),
2019
Paper
- Scalable RDMA RPC on Reliable Connection with Efficient Resource Sharing.
Proceedings of the Fourteenth EuroSys Conference
(EuroSys'19),
2019
Paper
Slides
- Mitigating Synchronous I/O Overhead in File Systems on Open-Channel SSDs.
ACM Transactions on Storage
(TOS),
2019
Paper
- Reducing rename overhead in full-path-indexed file system.
Advanced Parallel Processing Technologies, 13th International Symposium
(APPT'19),
2019
Paper
- Correlation-Aware Stripe Organization for Efficient Writes in Erasure-Coded Storage: Algorithms and Evaluation.
IEEE Transactions on Parallel and Distributed Systems
(TPDS),
2019
Paper
2018
- LerGAN: A Zero-Free, Low Data Movement and PIM-Based GAN Architecture.
51st Annual IEEE/ACM International Symposium on Microarchitecture
(MICRO'18),
2018
Paper
- Exporting Transactional Interface to Applications in Log-Structured File Systems.
IEEE International Conference on Networking, Architecture and Storage
(NAS'18),
2018
Paper
- Performance analysis on structure of racetrack memory.
Hongbin Zhang,
Chao Zhang,
Qingda Hu,
Chengmo Yang,
Jiwu Shu,
23rd Asia and South Pacific Design Automation Conference
(ASP-DAC'18),
2018
Paper
- Empirical Study of Transactional Management for Persistent Memory.
IEEE 7th Non-Volatile Memory Systems and Applications Symposium
(NVMSA'18),
2018
Paper
- A Flattened Metadata Service for Distributed File Systems.
IEEE Transactions on Parallel and Distributed Systems
(TPDS),
2018
Paper
- Accelerating breadth-first graph search on a single server by dynamic edge trimming.
Journal of Parallel and Distributed Computing
(JPDC),
2018
Paper
- Efficient and Consistent NVMM Cache for SSD-based File System.
IEEE Transactions on Computers
(TC),
2018
Paper
- HiNFS: A Persistent Memory File System with Both Buffering and Direct-Access.
ACM Transactions on Storage
(TOS),
2018
Paper
- Encoding-Aware Data Placement for Efficient Degraded Reads in XOR-Coded Storage Systems: Algorithms and Evaluation.
IEEE Transactions on Parallel and Distributed Systems
(TPDS),
2018
Paper
2017
- Locofs: A loosely-coupled metadata service for distributed file systems.
Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
(SC'17),
2017
Paper
- Correlation-Aware Stripe Organization for Efficient Writes in Erasure-Coded Storage Systems.
IEEE 36th Symposium on Reliable Distributed Systems
(SRDS'17),
2017
Paper
- Efficient storage management for aged file systems on persistent memory.
Proceedings of the Conference on Design, Automation & Test in Europe
(DATE'17),
2017
Paper
- Protect non-volatile memory from wear-out attack based on timing difference of row buffer hit/miss.
Proceedings of the Conference on Design, Automation & Test in Europe
(DATE'17),
2017
Paper
- Log-structured non-volatile main memory.
Qingda Hu,
Jinglei Ren,
Anirudh Badam,
Jiwu Shu,
Thomas Moscibroda,
USENIX Annual Technical Conference
(USENIX ATC'17),
2017
Paper
- Octopus: an RDMA-enabled Distributed Persistent Memory File System.
USENIX Annual Technical Conference
(USENIX ATC'17),
2017
Paper
- FlashKV: Accelerating KV performance with open-channel SSDs.
ACM Transactions on Embedded Computing Systems
(TECS),
2017
Paper
- Seek-efficient i/o optimization in single failure recovery for xor-coded storage systems.
IEEE Transactions on Parallel and Distributed Systems
(TPDS),
2017
Paper
- Short code: An efficient RAID-6 MDS code for optimizing degraded reads and partial stripe writes.
IEEE Transactions on Computers
(TC),
2017
Paper
2016
- AsyncStripe: I/O efficient asynchronous graph computing on a single server.
Proceedings of the Eleventh IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis
(CODESS'16),
2016
Paper
- Encoding-aware data placement for efficient degraded reads in XOR-coded storage systems.
IEEE 35th Symposium on Reliable Distributed Systems
(SRDS'16),
2016
Paper
- Making Cold Data Identification Efficient in Non-volatile Memory Systems.
Asia-Pacific Web Conference
(APWeb'16),
2016
Paper
- Empirical study of redo and undo logging in persistent memory.
5th Non-Volatile Memory Systems and Applications Symposium
(NVMSA'16),
2016
Paper
- Run-time performance estimation and fairness-oriented scheduling policy for concurrent GPGPU applications.
45th International Conference on Parallel Processing
(ICPP'16),
2016
Paper
- Reconsidering single failure recovery in clustered file systems.
46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks
(DSN'16),
2016
Paper
- Efficient routing for cooperative data regeneration in heterogeneous storage networks.
IEEE/ACM 24th International Symposium on Quality of Service
(IWQoS'16),
2016
Paper
- HW/SW co-design of nonvolatile IO system in energy harvesting sensor nodes for optimal data acquisition.
Zewei Li,
Yongpan Liu,
Daming Zhang,
Chun Jason Xue,
Zhangyuan Wang,
Xin Shi,
Wenyu Sun,
Jiwu Shu,
Huazhong Yang,
53nd ACM/EDAC/IEEE Design Automation Conference
(DAC'16),
2016
Paper
- Fastbfs: Fast breadth-first graph search on a single server.
IEEE International Parallel and Distributed Processing Symposium
(IPDPS'16),
2016
Paper
- Exploring main memory design based on racetrack memory technology.
Qingda Hu,
Guangyu Sun,
Jiwu Shu,
Chao Zhang,
Proceedings of the 26th edition on Great Lakes Symposium on VLSI
(GLS-VLSI'16),
2016
Paper
- Fast and failure-consistent updates of application data in non-volatile main memory file system.
32nd Symposium on Mass Storage Systems and Technologies
(MSST'16),
2016
Paper
- A high performance file system for non-volatile main memory.
Proceedings of the Eleventh European Conference on Computer Systems
(EuroSys'16),
2016
Paper
- Pin tumbler lock: A shift based encryption mechanism for racetrack memory.
Hongbin Zhang,
Chao Zhang,
Xian Zhang,
Guangyu Sun,
Jiwu Shu,
21st Asia and South Pacific Design Automation Conference
(ASP-DAC'16),
2016
Paper
- ParaFS: A log-structured file system to exploit the internal parallelism of flash devices.
USENIX Annual Technical Conference
(USENIX ATC'16),
2016
Paper
- Parity-switched data placement: Optimizing partial stripe writes in xor-coded storage systems.
IEEE Transactions on Parallel and Distributed Systems
(TPDS),
2016
Paper
- Hv code: An all-around mds code for raid-6 storage systems.
IEEE Transactions on Parallel and Distributed Systems
(TPDS),
2016
Paper
- Reconsidering single disk failure recovery for erasure coded storage systems: Optimizing load balancing in stack-level.
IEEE Transactions on Parallel and Distributed Systems
(TPDS),
2016
Paper
- Blurred persistence: Efficient transactions in persistent memory.
ACM Transactions on Storage
(TOS),
2016
Paper
- Supporting system consistency with differential transactions in flash-based SSDs.
IEEE Transactions on Computers
(TC),
2016
Paper
- Caco: An efficient cauchy coding approach for cloud storage systems.
IEEE Transactions on Computers
(TC),
2016
Paper
2015
- Sky: Opinion Dynamics Based Consensus for P2p Network with Trust Relationships.
International Conference on Algorithms and Architectures for Parallel Processing
(ICAAPP),
2015
Paper
- Ec-frm: An erasure coding framework to speed up reads for erasure coded cloud storage systems.
44th International Conference on Parallel Processing
(ICPP'15),
2015
Paper
- Exploring data placement in racetrack memory based scratchpad memory.
IEEE Non-Volatile Memory System and Applications Symposium
(NVMSA'15),
2015
Paper
- Blurred persistence in transactional persistent memory.
31st Symposium on Mass Storage Systems and Technologies
(MSST'15),
2015
Paper
- D-Code: An efficient RAID-6 code to optimize I/O loads and read performance.
IEEE International Parallel and Distributed Processing Symposium
(IPDPS'15),
2015
Paper
- DP 2: reducing transaction overhead with differential and dual persistency in persistent memory.
Proceedings of the 12th ACM International Conference on Computing Frontiers
(CF'15),
2015
Paper
- High-performance and lightweight transaction support in flash-based SSDs.
IEEE Transactions on Computers
(TC),
2015
Paper
- Redistribute Data to Regain Load Balance during RAID-4 Scaling.
IEEE Transactions on Parallel and Distributed Systems
(TPDS),
2015
Paper
2014
- Loose-ordering consistency for persistent memory.
IEEE 32nd International Conference on Computer Design
(ICCD'14),
2014
Paper
- ReconFS: A reconstructable file system on flash storage.
Proceedings of the 12th USENIX Conference on File and Storage Technologies
(FAST'14),
2014
Paper
- Design and implementation of an asymmetric block-based parallel file system.
IEEE Transactions on Computers
(TC),
2014
Paper
2013
- Aegis: Partitioning data block for efficient recovery of stuck-at-faults in phase change memory.
Jie Fan,
Song Jiang,
Jiwu Shu,
Youhui Zhang,
Weimin Zhen,
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
(MICRO'13),
2013
Paper
- LightTx: A lightweight transactional design in flash-based SSDs to support flexible transactions.
IEEE 31st International Conference on Computer Design
(ICCD'13),
2013
Paper
- Extending the lifetime of flash-based storage through reducing write amplification from file systems.
Proceedings of the 12th USENIX Conference on File and Storage Technologies
(FAST'13),
2013
Paper
2012
- Generalized X-code: An efficient RAID-6 code for arbitrary size of disk array.
ACM Transactions on Storage
(TOS),
2012
Paper
2010
- Preventing Silent Data Corruptions from Propagating During Data Reconstruction.
IEEE Transactions on Computers
(TC),
2010
Paper
- SOPA: Selecting the Optimal Policy Adaptively for a cache system.
ACM Transactions on Storage
(TOS),
2010
Paper
- DACO: A High Performance Disk Architecture Designed Specially for Large Scale Erasure Coded Storage Systems.
IEEE Transactions on Computers
(TC),
2010
Paper
- ALV: A New Data Redistribution Approach to RAID-5 Scaling.
IEEE Transactions on Computers
(TC),
2010
Paper
2009
- GRID Codes: Strip-based Erasure Codes with High Fault Tolerance for Storage Systems.
ACM Transactions on Storage
(TOS),
2009
Paper
2007
- SLAS: An Efficient Approach to Scaling Round-robin Striped Volumes.
ACM Transactions on Storage
(TOS),
2007
Paper
- Design and Implementation of an Out-of-Band Virtualization System for Large SANs.
IEEE Transactions on Computers
(TC),
2007
Paper
2005
- Design and Implementation of a SAN System Based on the Fiber Channel Protocol.
IEEE Transactions on Computers
(TC),
2005
Paper
- A Parallel Transient Stability Simulation for Power System.
IEEE Transactions on Power Systems
(TOPS),
2005
Paper