Minhui Xie

Ph.D. Student

Storage Research Group

Department of Computer Science and Technology

Tsinghua University

Room 8-201, East Main Building, Tsinghua University, Beijing, China

Email: xmh19 AT mails dot tsinghua dot edu dot cn

About Me

I am Minhui Xie, a fifth-year Ph.D. student from Tsinghua University, advised by Professor Youyou Lu and Jiwu Shu. I am a system researcher. My research focus is building efficient systems for at-scale machine learning, with emerging hardware (e.g., persistent memory, modern GPUs). I am so excited about the interact field between ML and System.



Department of Computer Science, Tsinghua University

2019 - present


Department of Computer Science, Nanjing University
GPA 4.84/5.00
Rank 1st /160 (core courses), 3rd /160 (all courses)

2015 - 2019

Publication Lists

  • PetPS: Supporting Huge Embedding Models with Persistent Memory.
    Minhui Xie, Youyou Lu, Qing Wang, Yangyang Feng, Jiaqiang Liu, Kai Ren, Jiwu Shu,
    The 49th International Conference on Very Large Data Bases (VLDB'23), 2023
    Paper Slides Star
  • Citron: Distributed Range Lock Management with One-sided RDMA.
    Jian Gao, Youyou Lu, Minhui Xie, Qing Wang, Jiwu Shu,
    The 21st USENIX Conference on File and Storage Technologies (FAST'23), 2023
  • Patronus: High-Performance and Protective Remote Memory.
    Bin Yan, Youyou Lu, Qing Wang, Minhui Xie, Jiwu Shu,
    The 21st USENIX Conference on File and Storage Technologies (FAST'23), 2023
    Paper Slides Star
  • Mobius: Fine Tuning Large-scale Models on Commodity GPU Servers.
    Yangyang Feng, Minhui Xie, Zijie Tian, Shuo Wang, Youyou Lu, Jiwu Shu,
    The 28th Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'23), 2023
    Paper Slides
  • Challenges and Technical Development of Large Model Training Storage Systems.
    冯杨洋, 汪庆, 谢旻晖, 舒继武,
    计算机研究与发展 2023
  • A Recommendation Model Inference System with GPU Direct Storage Access.
    谢旻晖, 陆游游, 冯杨洋, 舒继武,
    计算机研究与发展 2023
  • Pacman: An Efficient Compaction Approach for Log-Structured Key-Value Store on Persistent Memory.
    Jing Wang, Youyou Lu, Qing Wang, Minhui Xie, Keji Huang, Jiwu Shu,
    USENIX Annual Technical Conference (USENIX ATC'22), 2022
    Paper Slides Star
  • Fleche: An Efficient GPU Embedding Cache for Personalized Recommendations.
    Minhui Xie, Youyou Lu, Jiazhen Lin, Qing Wang, Jian Gao, Kai Ren, Jiwu Shu,
    The 17th European Conference on Computer Systems (EuroSys'22), 2022
    Paper Slides
  • Nap: Persistent Memory Indexes for NUMA Architectures.
    Qing Wang, Youyou Lu, Junru Li, Minhui Xie, Jiwu Shu,
    ACM Transactions on Storage (TOS), 2022
  • Kraken: Memory Efficient Continual Learning for Large-Scale Real-Time Recommendations.
    Minhui Xie, Kai Ren, Youyou Lu, Guangxu Yang, Qingxing Xu, Bihai Wu, Jiazhen Lin, Hongbo Ao, Wanhong Xu, Jiwu Shu,
    Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC'20), 2020
    Paper Slides Star


PetPS - supporting huge embedding models with persistent memory


  • PetPS is the first system that applies byte-addressable NVM technology to reduce storage costs of huge embedding models. It solves the essential problems caused by the poor performance of NVM hardware.
  • It has been deployed at Kuaishou’s datacenters and successfully withstood the access pressure of over 26 billion video recommendations every day. It got 30% cost-savings while maintaining service performance.
  • It has been reported by the industry and official media such as Tsinghua(link), Kuaishou, Intel(link), and People.cn(link).

Fleche - efficient GPU-resident embedding cache


  • In this work, we identify the DRAM bandwidth scarcity problem and propose Fleche to address it. Fleche’s key idea is absorbing hot accesses via a lightweight GPU-resident embedding cache.
  • Fleche gets up to 4.0x speedup of end-to-end inference throughput over NVIDIA HugeCTR, a well-known highly optimized industrial system.

Kraken - memory efficient continual learning for at-scale recommendation systems


  • Kraken redesigns the age-old structure of embedding tables for continual learning and tailors the optimizer algorithm to make thrift use of DRAM. It can trisect the memory usage while keeping model performance.
  • It has been cited and highly rated by companies including Facebook, Tencent, Alibaba, ByteDance, Kuaishou, and Huawei. It was also incorporated into a popular open-source book on GitHub in the area of MLSys, OpenMLSys(link).

Grants & Awards

Awards During Ph.D.

  • Huawei Scholarship


  • Longfor Scholarship


  • Ganzhou Scholarship


  • Longfor Scholarship


  • Tsinghua First-class Scholarship


  • Student Grant from USENIX FAST


Selected awards before Ph.D.

  • Outstanding Graduate of Nanjing University


  • Tung OOCL Scholarship (5%)


  • National Scholarship (2%)


  • National Second Prize, China Undergraduate Mathematical Contest in Modeling


  • Meritorious Winner, MCM/ICM


  • Tung OOCL Scholarship (5%)


  • Excellent student at Nanjing University (5%)



  • EuroSys 2023, Artifact reviewer
  • SIGCOMM 2022, Artifact reviewer
  • USENIX ATC 2022, Artifact reviewer
  • OSDI 2022, Artifact reviewer
  • IEEE Transactions on Parallel and Distributed Systems (TPDS), 2022, Reviewer
  • EuroSys 2022, Artifact reviewer
  • Long-term volunteer of ChinaSys

Invited Talks

  • Fleche - efficient GPU-resident embedding cache
    • NVIDIA, Beijing, China - May 26, 2022
    • EuroSys’22, Rennes, France - Apr 04, 2022
  • Kraken - memory efficient continual learning for at-scale recommendation systems
    • Huawei, Beijing, China - Mar 25, 2022
    • Tsinghua, Beijing, China - Nov 19, 2020
    • SC’20, San Diego, US - Nov 11, 2020


  • TA, Computer Organization and Architecture, Tsinghua University, Spring 2022
  • TA, Computer Organization and Architecture, Tsinghua University, Spring 2021
  • TA, Computer Organization and Architecture, Tsinghua University, Spring 2020
  • TA, Introduction to Computer System, Nanjing University, Fall 2017