📝 Publications

A full publication list is available on my google scholar page.

ICML 2025
sym

[ICML 2025] How Do Images Align and Complement LiDAR? Towards a Harmonized Multi-modal 3D Panoptic Segmentation
Yining Pan, Qiongjie Cui, Xulei Yang, Na Zhao [Project page]

  • This paper proposes the Image-Assists-LiDAR (IAL) model, which harmonizes LiDAR and images through synchronized augmentation, token fusion, and prior query generation.
  • IAL achieves SOTA performance on 3D panoptic benchmarks, outperforming baseline methods by over 4%.
CVPR 2024
sym

[CVPR 2024] InstructVideo: Instructing Video Diffusion Models with Human Feedback
H. Yuan, S. Zhang, X. Wang, Y. Wei, T. Feng, Yining Pan, Y. Zhang, Z. Liu, S. Albanie, D. Ni
GitHub Stars GitHub Forks [Project page]

  • InstructVideo is the first research attempt that instructs video diffusion models with human feedback.
  • InstructVideo significantly enhances the visual quality of generated videos without compromising generalization capabilities, with merely 0.1% of the parameters being fine-tuned.
ICCV 2023
sym

[ICCV 2023] RLIPv2: Fast Scaling of Relational Language-Image Pre-training
H. Yuan, S. Zhang, X. Wang, S. Albanie, Yining Pan, T. Feng, J. Jiang, D. Ni, Y. Zhang, D. Zhao
GitHub Stars GitHub Forks

  • RLIPv2 elevates RLIP by leveraging a new language-image fusion mechanism, designed for expansive data scales.