Home - Minhui Xie

Minhui Xie

Ph.D. Student

Department of Computer Science and Technology

Room 8-201, East Main Building, Tsinghua University, Beijing, China

Email: xmh19 AT mails dot tsinghua dot edu dot cn

About Me

I am Minhui Xie, a fifth-year Ph.D. student from Tsinghua University, advised by Professor Youyou Lu and Jiwu Shu. I am a system researcher. My research focus is building efficient systems for at-scale machine learning, with emerging hardware (e.g., persistent memory, modern GPUs). I am so excited about the interact field between ML and System.

Education

Ph.D.

Department of Computer Science, Tsinghua University

2019 - present

B.S.

Department of Computer Science, Nanjing University
GPA 4.84/5.00
Rank 1st /160 (core courses), 3rd /160 (all courses)

2015 - 2019

Publication Lists

GPreempt: GPU Preemptive Scheduling Made General and Efficient.

Ruwen Fan, Tingxu Ren, Minhui Xie, Shiwei Gao, Jiwu Shu, Youyou Lu,

USENIX Annual Technical Conference (USENIX ATC'25), 2025
Paper
Frugal: Efficient and Economic Embedding Model Training with Commodity GPUs.

Minhui Xie, Shaoxun Zeng, Hao Guo, Shiwei Gao, Youyou Lu,

The 30th Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'25), 2025
Paper
Medusa: Accelerating Serverless LLM Inference with Materialization.

Shaoxun Zeng, Minhui Xie, Shiwei Gao, Youmin Chen, Youyou Lu,

The 30th Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'25), 2025
Paper
MaxEmbed: Maximizing SSD Bandwidth Utilization for Huge Embedding Models Serving.

Ruwen Fan, Minhui Xie, Haodi Jiang, Youyou Lu,

The 29th Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'24), 2024
Paper
Challenges and Technical Development of Large Model Training Storage Systems.

冯杨洋, 汪庆, 谢旻晖, 舒继武,

计算机研究与发展 2024
Paper
PetPS: Supporting Huge Embedding Models with Persistent Memory.

Minhui Xie, Youyou Lu, Qing Wang, Yangyang Feng, Jiaqiang Liu, Kai Ren, Jiwu Shu,

The 49th International Conference on Very Large Data Bases (VLDB'23), 2023
Paper Slides Star
Citron: Distributed Range Lock Management with One-sided RDMA.

Jian Gao, Youyou Lu, Minhui Xie, Qing Wang, Jiwu Shu,

The 21st USENIX Conference on File and Storage Technologies (FAST'23), 2023
Paper
Patronus: High-Performance and Protective Remote Memory.

Bin Yan, Youyou Lu, Qing Wang, Minhui Xie, Jiwu Shu,

The 21st USENIX Conference on File and Storage Technologies (FAST'23), 2023
Paper Slides Star
Mobius: Fine Tuning Large-scale Models on Commodity GPU Servers.

Yangyang Feng, Minhui Xie, Zijie Tian, Shuo Wang, Youyou Lu, Jiwu Shu,

The 28th Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'23), 2023
Paper Slides
A Recommendation Model Inference System with GPU Direct Storage Access.

谢旻晖, 陆游游, 冯杨洋, 舒继武,

计算机研究与发展 2024
Paper
Pacman: An Efficient Compaction Approach for Log-Structured Key-Value Store on Persistent Memory.

Jing Wang, Youyou Lu, Qing Wang, Minhui Xie, Keji Huang, Jiwu Shu,

USENIX Annual Technical Conference (USENIX ATC'22), 2022
Paper Slides Star
Fleche: An Efficient GPU Embedding Cache for Personalized Recommendations.

Minhui Xie, Youyou Lu, Jiazhen Lin, Qing Wang, Jian Gao, Kai Ren, Jiwu Shu,

The 17th European Conference on Computer Systems (EuroSys'22), 2022
Paper Slides
Nap: Persistent Memory Indexes for NUMA Architectures.

Qing Wang, Youyou Lu, Junru Li, Minhui Xie, Jiwu Shu,

ACM Transactions on Storage (TOS), 2022
Paper
Kraken: Memory Efficient Continual Learning for Large-Scale Real-Time Recommendations.

Minhui Xie, Kai Ren, Youyou Lu, Guangxu Yang, Qingxing Xu, Bihai Wu, Jiazhen Lin, Hongbo Ao, Wanhong Xu, Jiwu Shu,

Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC'20), 2020
Paper Slides Star

Projects

PetPS - supporting huge embedding models with persistent memory

2020-2022

PetPS is the first system that applies byte-addressable NVM technology to reduce storage costs of huge embedding models. It solves the essential problems caused by the poor performance of NVM hardware.
It has been deployed at Kuaishou’s datacenters and successfully withstood the access pressure of over 26 billion video recommendations every day. It got 30% cost-savings while maintaining service performance.
It has been reported by the industry and official media such as Tsinghua(link), Kuaishou, Intel(link), and People.cn(link).

Fleche - efficient GPU-resident embedding cache

2021-2021

In this work, we identify the DRAM bandwidth scarcity problem and propose Fleche to address it. Fleche’s key idea is absorbing hot accesses via a lightweight GPU-resident embedding cache.
Fleche gets up to 4.0x speedup of end-to-end inference throughput over NVIDIA HugeCTR, a well-known highly optimized industrial system.

Kraken - memory efficient continual learning for at-scale recommendation systems

2019-2020

Kraken redesigns the age-old structure of embedding tables for continual learning and tailors the optimizer algorithm to make thrift use of DRAM. It can trisect the memory usage while keeping model performance.
It has been cited and highly rated by companies including Facebook, Tencent, Alibaba, ByteDance, Kuaishou, and Huawei. It was also incorporated into a popular open-source book on GitHub in the area of MLSys, OpenMLSys(link).

Grants & Awards

Awards During Ph.D.

Huawei Scholarship

2023

Longfor Scholarship

2023

Ganzhou Scholarship

2022

Longfor Scholarship

2022

Tsinghua First-class Scholarship

2021

Student Grant from USENIX FAST

2021

Selected awards before Ph.D.

Outstanding Graduate of Nanjing University

2019

Tung OOCL Scholarship (5%)

2018

National Scholarship (2%)

2017

National Second Prize, China Undergraduate Mathematical Contest in Modeling

2017

Meritorious Winner, MCM/ICM

2017

Tung OOCL Scholarship (5%)

2016

Excellent student at Nanjing University (5%)

2016

Services

EuroSys 2023, Artifact reviewer
SIGCOMM 2022, Artifact reviewer
USENIX ATC 2022, Artifact reviewer
OSDI 2022, Artifact reviewer
IEEE Transactions on Parallel and Distributed Systems (TPDS), 2022, Reviewer
EuroSys 2022, Artifact reviewer
Long-term volunteer of ChinaSys

Invited Talks

Fleche - efficient GPU-resident embedding cache
- NVIDIA, Beijing, China - May 26, 2022
- EuroSys’22, Rennes, France - Apr 04, 2022
Kraken - memory efficient continual learning for at-scale recommendation systems
- Huawei, Beijing, China - Mar 25, 2022
- Tsinghua, Beijing, China - Nov 19, 2020
- SC’20, San Diego, US - Nov 11, 2020

Teaching

TA, Computer Organization and Architecture, Tsinghua University, Spring 2022
TA, Computer Organization and Architecture, Tsinghua University, Spring 2021
TA, Computer Organization and Architecture, Tsinghua University, Spring 2020
TA, Introduction to Computer System, Nanjing University, Fall 2017