Minhui Xie

Ph.D. Student

Storage Research Group

Department of Computer Science and Technology

Tsinghua University

Room 8-201, East Main Building, Tsinghua University, Beijing, China

Email: xmh19 AT mails dot tsinghua dot edu dot cn


About Me

I am Minhui Xie, a fifth-year Ph.D. student from Tsinghua University, advised by Professor Youyou Lu and Jiwu Shu. I am a system researcher. My research focus is building efficient systems for at-scale machine learning, with emerging hardware (e.g., persistent memory, modern GPUs). I am so excited about the interact field between ML and System.


Education

Ph.D.

Department of Computer Science, Tsinghua University

2019 - present

B.S.

Department of Computer Science, Nanjing University
GPA 4.84/5.00
Rank 1st /160 (core courses), 3rd /160 (all courses)

2015 - 2019


Publication Lists

  • Frugal: Efficient and Economic Embedding Model Training with Commodity GPUs.
    Minhui Xie, Shaoxun Zeng, Hao Guo, Shiwei Gao, Youyou Lu,
    The 30th Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'25), 2025
    Paper
  • Medusa: Accelerating Serverless LLM Inference with Materialization.
    Shaoxun Zeng, Minhui Xie, Shiwei Gao, Youmin Chen, Youyou Lu,
    The 30th Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'25), 2025
    Paper
  • MaxEmbed: Maximizing SSD Bandwidth Utilization for Huge Embedding Models Serving.
    Ruwen Fan, Minhui Xie, Haodi Jiang, Youyou Lu,
    The 29th Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'24), 2024
    Paper
  • Challenges and Technical Development of Large Model Training Storage Systems.
    冯杨洋, 汪庆, 谢旻晖, 舒继武,
    计算机研究与发展 2024
    Paper
  • PetPS: Supporting Huge Embedding Models with Persistent Memory.
    Minhui Xie, Youyou Lu, Qing Wang, Yangyang Feng, Jiaqiang Liu, Kai Ren, Jiwu Shu,
    The 49th International Conference on Very Large Data Bases (VLDB'23), 2023
    Paper Slides Star
  • Citron: Distributed Range Lock Management with One-sided RDMA.
    Jian Gao, Youyou Lu, Minhui Xie, Qing Wang, Jiwu Shu,
    The 21st USENIX Conference on File and Storage Technologies (FAST'23), 2023
    Paper
  • Patronus: High-Performance and Protective Remote Memory.
    Bin Yan, Youyou Lu, Qing Wang, Minhui Xie, Jiwu Shu,
    The 21st USENIX Conference on File and Storage Technologies (FAST'23), 2023
    Paper Slides Star
  • Mobius: Fine Tuning Large-scale Models on Commodity GPU Servers.
    Yangyang Feng, Minhui Xie, Zijie Tian, Shuo Wang, Youyou Lu, Jiwu Shu,
    The 28th Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'23), 2023
    Paper Slides
  • A Recommendation Model Inference System with GPU Direct Storage Access.
    谢旻晖, 陆游游, 冯杨洋, 舒继武,
    计算机研究与发展 2024
    Paper
  • Pacman: An Efficient Compaction Approach for Log-Structured Key-Value Store on Persistent Memory.
    Jing Wang, Youyou Lu, Qing Wang, Minhui Xie, Keji Huang, Jiwu Shu,
    USENIX Annual Technical Conference (USENIX ATC'22), 2022
    Paper Slides Star
  • Fleche: An Efficient GPU Embedding Cache for Personalized Recommendations.
    Minhui Xie, Youyou Lu, Jiazhen Lin, Qing Wang, Jian Gao, Kai Ren, Jiwu Shu,
    The 17th European Conference on Computer Systems (EuroSys'22), 2022
    Paper Slides
  • Nap: Persistent Memory Indexes for NUMA Architectures.
    Qing Wang, Youyou Lu, Junru Li, Minhui Xie, Jiwu Shu,
    ACM Transactions on Storage (TOS), 2022
    Paper
  • Kraken: Memory Efficient Continual Learning for Large-Scale Real-Time Recommendations.
    Minhui Xie, Kai Ren, Youyou Lu, Guangxu Yang, Qingxing Xu, Bihai Wu, Jiazhen Lin, Hongbo Ao, Wanhong Xu, Jiwu Shu,
    Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC'20), 2020
    Paper Slides Star

Projects

PetPS - supporting huge embedding models with persistent memory

2020-2022

  • PetPS is the first system that applies byte-addressable NVM technology to reduce storage costs of huge embedding models. It solves the essential problems caused by the poor performance of NVM hardware.
  • It has been deployed at Kuaishou’s datacenters and successfully withstood the access pressure of over 26 billion video recommendations every day. It got 30% cost-savings while maintaining service performance.
  • It has been reported by the industry and official media such as Tsinghua(link), Kuaishou, Intel(link), and People.cn(link).

Fleche - efficient GPU-resident embedding cache

2021-2021

  • In this work, we identify the DRAM bandwidth scarcity problem and propose Fleche to address it. Fleche’s key idea is absorbing hot accesses via a lightweight GPU-resident embedding cache.
  • Fleche gets up to 4.0x speedup of end-to-end inference throughput over NVIDIA HugeCTR, a well-known highly optimized industrial system.

Kraken - memory efficient continual learning for at-scale recommendation systems

2019-2020

  • Kraken redesigns the age-old structure of embedding tables for continual learning and tailors the optimizer algorithm to make thrift use of DRAM. It can trisect the memory usage while keeping model performance.
  • It has been cited and highly rated by companies including Facebook, Tencent, Alibaba, ByteDance, Kuaishou, and Huawei. It was also incorporated into a popular open-source book on GitHub in the area of MLSys, OpenMLSys(link).

Grants & Awards

Awards During Ph.D.

  • Huawei Scholarship

2023

  • Longfor Scholarship

2023

  • Ganzhou Scholarship

2022

  • Longfor Scholarship

2022

  • Tsinghua First-class Scholarship

2021

  • Student Grant from USENIX FAST

2021

Selected awards before Ph.D.

  • Outstanding Graduate of Nanjing University

2019

  • Tung OOCL Scholarship (5%)

2018

  • National Scholarship (2%)

2017

  • National Second Prize, China Undergraduate Mathematical Contest in Modeling

2017

  • Meritorious Winner, MCM/ICM

2017

  • Tung OOCL Scholarship (5%)

2016

  • Excellent student at Nanjing University (5%)

2016


Services

  • EuroSys 2023, Artifact reviewer
  • SIGCOMM 2022, Artifact reviewer
  • USENIX ATC 2022, Artifact reviewer
  • OSDI 2022, Artifact reviewer
  • IEEE Transactions on Parallel and Distributed Systems (TPDS), 2022, Reviewer
  • EuroSys 2022, Artifact reviewer
  • Long-term volunteer of ChinaSys

Invited Talks

  • Fleche - efficient GPU-resident embedding cache
    • NVIDIA, Beijing, China - May 26, 2022
    • EuroSys’22, Rennes, France - Apr 04, 2022
  • Kraken - memory efficient continual learning for at-scale recommendation systems
    • Huawei, Beijing, China - Mar 25, 2022
    • Tsinghua, Beijing, China - Nov 19, 2020
    • SC’20, San Diego, US - Nov 11, 2020

Teaching

  • TA, Computer Organization and Architecture, Tsinghua University, Spring 2022
  • TA, Computer Organization and Architecture, Tsinghua University, Spring 2021
  • TA, Computer Organization and Architecture, Tsinghua University, Spring 2020
  • TA, Introduction to Computer System, Nanjing University, Fall 2017