Hi, there!

I am a second-year Ph.D. student in Computer Science jointly at Shanghai AI Lab and USTC advised by Prof. Xiaogang Wang and Prof. Wanli Ouyang. I work closely with Prof. Tong He. I earned my B.S. degree in Artificial Intelligence Honor Class at Shanghai Jiao Tong University, advised by Prof. Cewu Lu. I also have had the privilege of working with Dr. Hao-Shu Fang and Dr. Jim Fan.

Research for fun and truth. My current research interests focus on embodied AI, robot manipulation, and 3D vision. Feel free to follow me on and for latest research announcements and updates!

In my personal life, I am passionate (but amateur) about football, music, literature, philosophy, traditional Chinese painting, and modern Chinese poems!

“The philosophers have only interpreted the world, in various ways. The point, however, is to change it.”

News

  • Oct. 2024 SPA has been announced! SPA is a novel representation learning framework that emphasizes the importance of 3D spatial awareness in embodied AI. Paper,code, and pre-trained models are all open-sourced! Check it out!
  • Sep. 2024: PointCloudMatters is accepted by NeurIPS D&B 2024! We prove that explicit representation like point cloud can significantly enhance the performance and generalization ability of robot learning policies. Codes are open-sourced!
  • Feb. 2024: UniPAD is accepted by CVPR 2024! Check out our code on !
  • Oct. 2023: PonderV2 and UniPAD has been announced! PonderV2 is a universal pre-training paradigm for 3D vision, paving the way for 3D foundation model. It achieves SOTA on 11 indoor and outdoor benchmarks. Check out our paper and code!
  • Jul. 2023: RH20T has been announced! RH20T is a large-scale open-source robotic dataset for learning diverse skills in one-shot, comprising over 110,000 contact-rich robot manipulation sequences across diverse skills, contexts, robots, and camera viewpoints, all collected in the real world. Please check out our website for latest updates!
  • Nov. 2022: MineDojo has won 🎉 Outstanding Paper Award 🎉 at NeurIPS announcement!
  • Nov. 2022: AlphaPose paper is accepted by TPAMI! AlphaPose is an accurate multi-person pose estimator, which has received more than 6.5K stars on Github. Check out the paper for more details and feel free to star on !
  • Oct. 2022: X-NeRF is accepted by WACV 2023! Checkout our code on !
  • Jun. 2022: MineDojo has been announced! MineDojo is a new framework for building generally capable agents with internet-scale knowledge in Minecraft. Paper, code, and databases are all open access. Check it out today!
Interests
  • Embodied AI
  • Computer Vision
  • Robot Learning
Education
  • B.S. in Artificial Intelligence Honor Class, 2019 - 2023

    Shanghai Jiao Tong University

  • Ph.D. in Computer Science, 2023 - Present

    Shanghai AI Lab & USTC

Publications

Visit my Google Scholar page for a comprehensive listing!

*
SPA: 3D Spatial-Awareness Enables Effective Embodied Representation
arXiv preprint, 2024.
SPA: 3D Spatial-Awareness Enables Effective Embodied Representation
Point Cloud Matters: Rethinking the Impact of Different Observation Spaces on Robot Learning
Neural Information Processing Systems (NeurIPS) Dataset & Benchmark, 2024.
Point Cloud Matters: Rethinking the Impact of Different Observation Spaces on Robot Learning
PonderV2: Pave the Way for 3D Foundation Model with A Universal Pre-training Paradigm
arXiv preprint.
PonderV2: Pave the Way for 3D Foundation Model with A Universal Pre-training Paradigm
UniPAD: A Universal Pre-training Paradigm for Autonomous Driving
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024..
UniPAD: A Universal Pre-training Paradigm for Autonomous Driving
RH20T: A Robotic Dataset for Learning Diverse Skills in One-Shot
IEEE International Conference on Robotics and Automation (ICRA), 2024.
RH20T: A Robotic Dataset for Learning Diverse Skills in One-Shot
AlphaTracker: a multi-animal tracking and behavioral analysis tool
Frontiers in Behavioral Neuroscience, 2023.
AlphaTracker: a multi-animal tracking and behavioral analysis tool
AlphaPose: Whole-Body Regional Multi-Person Pose Estimation and Tracking in Real-Time
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI).
AlphaPose: Whole-Body Regional Multi-Person Pose Estimation and Tracking in Real-Time
X-NeRF: Explicit Neural Radiance Field for Multi-Scene 360 Insufficient RGB-D Views
IEEE Winter Conference on Applications of Computer Vision (WACV), 2023.
X-NeRF: Explicit Neural Radiance Field for Multi-Scene 360 Insufficient RGB-D Views
MineDojo: Building Open-Ended Embodied Agents with Internet-Scale Knowledge
💫✨ Outstanding Paper Award ✨💫. Neural Information Processing Systems (NeurIPS) Dataset & Benchmark, 2022.
MineDojo: Building Open-Ended Embodied Agents with Internet-Scale Knowledge
Unsupervised Multi-Task Learning for 3D Subtomogram Image Alignment, Clustering and Segmentation
IEEE International Conference on Image Processing (ICIP), 2022.
Unsupervised Multi-Task Learning for 3D Subtomogram Image Alignment, Clustering and Segmentation

Experience

 
 
 
 
 
Shanghai AI Lab & USTC
Ph.D in Computer Science
Sep 2023 – Present Shanghai, China
 
 
 
 
 
Shanghai AI Lab
Research Intern
Nov 2022 – Present Shanghai, China
  • Conducting AI research on 3D vision, foundation model and Embodied AI.
  • Joint Ph.D. with USTC.
 
 
 
 
 
MVIG, SJTU
B.S. in Artificial Intelligence
Sep 2019 – Jun 2023 Shanghai, China
 
 
 
 
 
Jim Team, NVIDIA AI Lab and Caltech
Remote Research Intern
Feb 2022 – Feb 2023 Shanghai, China
 
 
 
 
 
Xu Lab, CMU
Remote Research Intern
Apr 2021 – Feb 2022 Shanghai, China

Poems

Some of my modern Chinese poems

石头
孤独:永恒
凌晨随笔
emo