Publications

* Equal Contribution | † Corresponding Author

Video Large Language Model

  1. SEATS.png
    Stage-adaptive Token Selection for Efficient Omni-modal LLMs
    Zijie XinJie YangRuixiang Zhao, Tianyi Wang, Fengyun Rao, Jing LYU, and Xirong Li†
    arXiv preprint, 2026
  2. OmniPro.png
    OmniPro: A Comprehensive Benchmark for Omni-Proactive Streaming Video Understanding
    Ruixiang ZhaoJie YangZijie Xin, Tianyi Wang, Fengyun Rao, Jing LYU, and Xirong Li†
    arXiv preprint, 2026

Cross-modal Retrieval

  1. SAVE: Speech-Aware Video Representation Learning for Video-Text Retrieval
    Ruixiang Zhao*, Zhihao Xu*, Bangxiang Lan,  Zijie XinJingyu Liu, and Xirong Li†
    In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2026
  2. Music Grounding by Short Video
    Zijie Xin, Minquan Wang, Jingyu Liu, Ye Ma, Quan ChenPeng Jiang, and Xirong Li†
    In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2025
  3. Learning Partially-Decorrelated Common Spaces for Ad-hoc Video Search
    Fan HuZijie Xin, and Xirong Li†
    In Proceedings of the 33rd ACM international conference on Multimedia (ACMMM), 2025
  4. DAPL: Integration of Positive and Negative Descriptions in Text-Based Person Search
    Yuchuan DengZhanpeng HuZijie Xin, Chuang Deng, and Qijun Zhao†
    In Proceedings of the IEEE International Conference on Multimedia and Expo (ICME), 2025
  5. Holistic Features are almost Sufficient for Text-to-Video Retrieval
    Kaibin Tian*Ruixiang Zhao*Zijie Xin, Bangxiang Lan, and Xirong Li†
    In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024

Medical AI

  1. FundusR1.png
    Fundus-R1: Training a Fundus-Reading MLLM with Knowledge-Aware Reasoning on Public Data
    Yuchuan DengQijie WeiKaiheng QianJiazhen LiuZijie Xin, Bangxiang Lan, Jingyu Liu, and 2 more authors
    arXiv preprint arXiv:2604.08322, 2026

Generative Model

  1. Multi-Object Sketch Animation by Scene Decomposition and Motion Planning
    Jingyu LiuZijie Xin, Yuhan Fu, Ruixiang Zhao, Bangxiang Lan, and Xirong Li†
    In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2025
  2. GnM.png
    Creating a 4D Sketch from a Single Image
    Jingyu Liu, Shuo Gao,  Zijie XinRuixiang Zhao, Bangxiang Lan, Yuchuan Deng, Qingwei Shen, and 3 more authors
    arXiv preprint, 2026