📝 Publications

A full publication list is available on my google scholar page.

CVPR 2026
sym

[CVPR 2026] PanDA: Unsupervised Domain Adaptation for Multimodal 3D Panoptic Segmentation in Autonomous Driving
Yining Pan, Shijie Li, Yuchen Wu, Xulei Yang, Na Zhao

  • PanDA studies unsupervised domain adaptation for multimodal 3D panoptic segmentation in autonomous driving.
  • It combines asymmetric multimodal drop and dual-expert pseudo-label refinement to improve robustness under domain shifts.
CVPR 2026
sym

[CVPR 2026] CCF: Complementary Collaborative Fusion for Domain Generalized Multi-Modal 3D Object Detection
Yuchen Wu, Kun Wang, Yining Pan, Na Zhao
[Project page]

  • CCF targets domain-generalized multi-modal 3D object detection for autonomous driving.
  • It rebalances camera and LiDAR queries with query-decoupled loss, LiDAR-guided depth priors, and complementary cross-modal masking.
ICML 2025
sym

[ICML 2025] How Do Images Align and Complement LiDAR? Towards a Harmonized Multi-modal 3D Panoptic Segmentation
Yining Pan, Qiongjie Cui, Xulei Yang, Na Zhao [Project page]

  • This paper proposes the Image-Assists-LiDAR (IAL) model, which harmonizes LiDAR and images through synchronized augmentation, token fusion, and prior query generation.
  • IAL achieves SOTA performance on 3D panoptic benchmarks, outperforming baseline methods by over 4%.
CVPR 2024
sym

[CVPR 2024] InstructVideo: Instructing Video Diffusion Models with Human Feedback
H. Yuan, S. Zhang, X. Wang, Y. Wei, T. Feng, Yining Pan, Y. Zhang, Z. Liu, S. Albanie, D. Ni
GitHub Stars GitHub Forks [Project page]

  • InstructVideo is the first research attempt that instructs video diffusion models with human feedback.
  • InstructVideo significantly enhances the visual quality of generated videos without compromising generalization capabilities, with merely 0.1% of the parameters being fine-tuned.
ICCV 2023
sym

[ICCV 2023] RLIPv2: Fast Scaling of Relational Language-Image Pre-training
H. Yuan, S. Zhang, X. Wang, S. Albanie, Yining Pan, T. Feng, J. Jiang, D. Ni, Y. Zhang, D. Zhao
GitHub Stars GitHub Forks

  • RLIPv2 elevates RLIP by leveraging a new language-image fusion mechanism, designed for expansive data scales.