Zijie Xin 辛梓杰

AI/CS Ph.D. student at Renmin University of China, Beijing

yak2.jpg

email: xinzijie@ruc.edu.cn

I am a first-year Ph.D. student in the AI & Media Computing Lab at the Renmin University of China, advised by Prof. Xirong Li.

I obtained my Bachelor’s degree with honors in the Top-notch Program (a class of 15 elite students selected from 400+) from Sichuan University in 2024, under the supervision of Prof. Qijun Zhao. I’ve interned at Tencent and KuaiShou.

My research primarily revolves around multi-media learning, video understanding, cross-modal retrieval, and open-set recognition, complemented by a broad curiosity in generative model, RAG, RL, and LLM.

News

Jul 7, 2025 I joined Tencent as a research internship on video understanding.
Jul 5, 2025 Our one paper on Ad-hoc Video Search has been accepted to ACMMM 2025! 🎉
Jun 26, 2025 Our two papers on Music Grounding by Short Video and Sketch Animation have been accepted to ICCV 2025! I’m proud to be the first author of MGSV. 🎉
Mar 21, 2025 Our one paper on Text-based Person Search has been accepted to ICME 2025! Congratulations to Yuchuan! 🎉
Sep 7, 2024 I officially started my PhD at RUC under the supervision of Professor Xirong Li in the AIMC Lab. 👨‍🎓
Jun 28, 2024 I successfully graduated with my bachelor’s degree from SCU and been honored as an Outstanding Graduate of Sichuan University! 👨‍🎓
Feb 27, 2024 Our one paper on a Multi-Grained Teaching Strategy for Efficient Text-to-Video Retrieval has been accepted to CVPR 2024! 🎉
Nov 29, 2023 I joined Kuaishou as a research internship on video-music retrieval.
Oct 8, 2023 After heading to Beijing and joining the AI & Media Computing Lab, I unofficially started my PhD adventure! :nerd_face:
Apr 20, 2023 I joined GeWu-Lab as a short-term intern.

Publications

* Equal Contribution | † Corresponding Author

  1. Music Grounding by Short Video
    Zijie Xin, Minquan Wang, Jingyu Liu, Ye Ma, Quan ChenPeng Jiang, and Xirong Li†
    In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2025
  2. Holistic Features are almost Sufficient for Text-to-Video Retrieval
    Kaibin Tian*Ruixiang Zhao*Zijie Xin, Bangxiang Lan, and Xirong Li†
    In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024
  3. Learning Partially-Decorrelated Common Spaces for Ad-hoc Video Search
    Fan HuZijie Xin, and Xirong Li†
    In Proceedings of the 33rd ACM international conference on Multimedia (ACMMM), 2025
  4. Multi-Object Sketch Animation by Scene Decomposition and Motion Planning
    Jingyu LiuZijie Xin, Yuhan Fu, Ruixiang Zhao, Bangxiang Lan, and Xirong Li†
    In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2025
  5. DAPL: Integration of Positive and Negative Descriptions in Text-Based Person Search
    Yuchuan DengZhanpeng HuZijie Xin, Chuang Deng, and Qijun Zhao†
    In Proceedings of the IEEE International Conference on Multimedia and Expo (ICME), 2025
Flag Counter