Yining Pan

Yining Pan

PhD student in SUTD | A*STAR, focusing on multi-modal perception, embodied AI, and autonomous driving

📝 Publications

A full publication list is available on my google scholar page.

CVPR 2026

sym

[CVPR 2026] PanDA: Unsupervised Domain Adaptation for Multimodal 3D Panoptic Segmentation in Autonomous Driving
Yining Pan, Shijie Li, Yuchen Wu, Xulei Yang, Na Zhao

PanDA studies unsupervised domain adaptation for multimodal 3D panoptic segmentation in autonomous driving.
It combines asymmetric multimodal drop and dual-expert pseudo-label refinement to improve robustness under domain shifts.

CVPR 2026

sym

[CVPR 2026] CCF: Complementary Collaborative Fusion for Domain Generalized Multi-Modal 3D Object Detection
Yuchen Wu, Kun Wang, Yining Pan, Na Zhao
[Project page]

CCF targets domain-generalized multi-modal 3D object detection for autonomous driving.
It rebalances camera and LiDAR queries with query-decoupled loss, LiDAR-guided depth priors, and complementary cross-modal masking.

ICML 2025

sym

[ICML 2025] How Do Images Align and Complement LiDAR? Towards a Harmonized Multi-modal 3D Panoptic Segmentation
Yining Pan, Qiongjie Cui, Xulei Yang, Na Zhao [Project page]

This paper proposes the Image-Assists-LiDAR (IAL) model, which harmonizes LiDAR and images through synchronized augmentation, token fusion, and prior query generation.
IAL achieves SOTA performance on 3D panoptic benchmarks, outperforming baseline methods by over 4%.

CVPR 2024

sym

[CVPR 2024] InstructVideo: Instructing Video Diffusion Models with Human Feedback
H. Yuan, S. Zhang, X. Wang, Y. Wei, T. Feng, Yining Pan, Y. Zhang, Z. Liu, S. Albanie, D. Ni
[Project page]

InstructVideo is the first research attempt that instructs video diffusion models with human feedback.
InstructVideo significantly enhances the visual quality of generated videos without compromising generalization capabilities, with merely 0.1% of the parameters being fine-tuned.

ICCV 2023

sym

[ICCV 2023] RLIPv2: Fast Scaling of Relational Language-Image Pre-training
H. Yuan, S. Zhang, X. Wang, S. Albanie, Yining Pan, T. Feng, J. Jiang, D. Ni, Y. Zhang, D. Zhao

RLIPv2 elevates RLIP by leveraging a new language-image fusion mechanism, designed for expansive data scales.