Chien-Yi Wang

I am an Applied Scientist at Amazon in Sunnyvale, CA. Previously, I worked as a Research Scientist at NVIDIA Research and a Research SDE at Microsoft AI R&D Center in Taiwan. I had 10+ years of experience specializing in computer vision research, deep learning-based model optimization, and machine learning service integration. My research focus is mainly on Multi-modal Large Language Model (MLLM), Face Representation Learning, and 3D Scene Understanding. Interested in revolutionizing a machine learning system from the bottom‑up, devising better problem‑solving methods for challenging tasks, and learning new technologies and tools if the need arises.

Email / Google Scholar / LinkedIn / Twitter / GitHub

News

[Feb. 2026] Two papers are accepted at CVPR 2026.
[Jan. 2026] Two papers V2V-LLM and V2V-GoT are accepted at ICRA 2026.
[Jan. 2025] Two papers are accepted at ICLR 2025.
[Nov. 2024] I joined Amazon Ring as an Applied Scientist.
[Jun. 2024] "BOFormer" is accepted at ICML 2024 AutoRL workshop as a Spotlight paper!
[May. 2024] "DoRA" is accepted at ICML 2024 as an Oral paper!.
[Feb. 2024] Our paper "MCPNet" is accepted at CVPR 2024.
[Jan. 2024] One paper is accepted at ICRA 2024.
[Jan. 2024] Our paper "RAPPER" is accepted at ICLR 2024.
[Jul. 2023] One paper "pFedPG" is accepted at ICCV 2023.
[Jun. 2023] "GAIN" received the best paper award for the CVPR 2023 Biometrics Workshop!
[Jun. 2023] Our "QuAVF" work ranked 1st place winner at the 2023 CVPR Ego4D Challenge (Social Understanding: Talking-to-me track)!
[Apr. 2023] One paper "GAIN" is accepted at CVPR 2023 Biometrics Workshop.
[Jan. 2023] One paper is accepted at TBIOM.
[Nov. 2022] Our paper "MixFairFace" is accepted at AAAI 2023.
[Oct. 2022] One paper is accepted at FG 2023.
[Oct. 2022] I joined NVIDIA Research as a Research Scientist.
[Jun. 2022] I am selected as an outstanding reviewer for CVPR 2022!
[Mar. 2022] Two papers "PatchNet" and "LaFR" are accepted at CVPR 2022.
[Dec. 2021] Our paper "FedFR" is accepted at AAAI 2022 as Oral!
[Oct. 2021] One paper is accepted at WACV 2022.
[Sep. 2021] One paper is accepted at FG 2021.

Research

I'm interested in computer vision and multi-modal representation learning.

	BOFormer: Learning to Solve Multi-Objective Bayesian Optimization via Non-Markovian RL Yu Heng Hung, Kai-Jie Lin, Yu-Heng Lin, Cheng Sun, Chien-Yi Wang, Ping-Chun Hsieh (=equal advising) International Conference on Machine Learning (ICML) AutoRL Workshop, 2024 (Spotlight)* International Conference on Learning Representations (ICLR) , 2025 OpenReview we present a generalized deep Q-learning framework and propose BOFormer, which substantiates the framework for Multi-Objective Bayesian Optimization (MOBO) via sequence modeling.
	DoRA: Weight-Decomposed Low-Rank Adaptation Shih-Yang Liu, Chien-Yi Wang, Hongxu Yin, Pavlo Molchanov, Yu-Chiang Frank Wang, Kwang-Ting Cheng, Min-Hung Chen International Conference on Machine Learning (ICML), 2024 (Oral) Project / arXiv / Code We presented DoRA, a new parameter-efficient fine-tuning approach, which consistently outperforms LoRA in fine-tuning LLM without incurring additional inference costs. These improvements are particularly notable for smaller ranks with 37.2% improvement over LoRA for rank 8 and 22.4% improvement for rank 4.
	MCPNet: An Interpretable Classifier via Multi-Level Concept Prototypes Bor-Shiun Wang, Chien-Yi Wang, Wei-Chen Chiu (=equal advising) IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)*, 2024 Project / arXiv / Code We propose the Multi-Level Concept Prototypes Classifier (MCPNet), an inherently interpretable model, that can explain a classifier's prediction via concept prototypes from low-to-high layers in the CNN models.
	Probabilistic 3D Multi-Object Cooperative Tracking for Autonomous Driving via Differentiable Multi-Sensor Kalman Filter Hsu-Kuang Chiu, Chien-Yi Wang, Min-Hung Chen, Stephen F. Smith IEEE International Conference on Robotics and Automation (ICRA), 2024 Project / arXiv / Code We propose a Differentiable Multi-Sensor Kalman Filter for 3D Multi-Object Cooperative Tracking (DMSTrack), which is designed to be capable of estimating observation noise covariance of each detection from different Connected Autonomous Vehicles (CAVs) to better take advantage of the Kalman Filter’s theoretical optimality property.
	RAPPER: Reinforced Rationale-Prompted Paradigm for Natural Language Explanation in Visual Question Answering Kai-Po Chang, Chi-Ping Huang, Wei-Yuan Cheng, Fu-En Yang, Chien-Yi Wang, Yung-Hsuan Lai, Yu-Chiang Frank Wang International Conference on Learning Representations (ICLR), 2024 OpenReview We introduce Rapper, a two-stage Reinforced RationalePrompted Paradigm for Natural Language Explanation (NLE) in Visual Question Answering (VQA).
	Efficient Model Personalization in Federated Learning via Client-Specific Prompt Generation Fu-En Yang, Chien-Yi Wang, Yu-Chiang Frank Wang IEEE International Conference on Computer Vision (ICCV), 2023 arXiv To leverage robust representations from large-scale models while enabling efficient model personalization for heterogeneous clients, we propose a novel personalized FL framework of client-specific Prompt Generation (pFedPG), which learns to deploy a personalized prompt generator at the server for producing client-specific visual prompts that efficiently adapts frozen backbones to local data distributions.
	QuAVF: Quality-aware Audio-Visual Fusion for Ego4D Talking to Me Challenge Hsi-Che Lin, Chien-Yi Wang, Min-Hung Chen, Szu-Wei Fu, Yu-Chiang Frank Wang IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshop, 2023 1st Place Winner @ Ego4D Talking-To-Me (TTM) track arXiv We propose the Quality-aware audio-visual fusion (QuAVF) framework which achieves 67.4% mean average precision (mAP) on the Ego4D talking-to-me (TTM) test set.
	A Closer Look at Geometric Temporal Dynamics for Face Anti-Spoofing Chih-Jung Chang, Yaw-Chern Lee, Shih-Hsuan Yao, Min-Hung Chen, Chien-Yi Wang, Shang-Hong Lai, Trista Pei-Chun Chen IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshop, 2023 Best Paper Award paper We propose Geometry-Aware Interaction Network (GAIN), which exploits dense facial landmarks with spatio-temporal graph convolutional network (ST-GCN) to establish a more interpretable and modularized Face Anti-Spoofing model.
	RGB-D Face Recognition with Identity-Style Disentanglement and Depth Augmentation Meng-Tzu Chiu, Hsun-Ying Cheng, Chien-Yi Wang, Shang-Hong Lai IEEE Transactions on Biometrics, Behavior, and Identity Science (TBIOM), 2023 paper We propose to augment facial segmentation and depth maps to assist the RGB-D face recognition task. With the multi-modal augmentation and identity-style disentanglement, the proposed RGB-D recognition model could achieve superior performance on several benchmarks.
	MixFairFace: Towards Ultimate Fairness via MixFair Adapter in Face Recognition Fu-En Wang, Chien-Yi Wang, Min Sun, Shang-Hong Lai 37th AAAI Conference on Artificial Intelligence (AAAI), 2023 arXiv / code We propose the MixFair Adapter to determine and reduce the identity bias of training samples. Besides, in order to push for ultimate fairness in face recognition, we propose a new evaluation protocol to fairly evaluate the fairness performance of different approaches.
	Generalized Face Anti-Spoofing via Multi-Task Learning and One-Side Meta Triplet Loss Chu-Chun Chuang, Chien-Yi Wang, Shang-Hong Lai IEEE International Conference on Automatic Face and Gesture Recognition (FG), 2023 arXiv We introduce a multi-task meta-learning framework for learning more generalized features for face anti-spoofing.
	PatchNet: A Simple Face Anti-Spoofing Framework via Fine-Grained Patch Recognition Chien-Yi Wang, Yu-Ding Lu, Shang-Ta Yang, Shang-Hong Lai IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022 arXiv / paper / supp / video We propose PatchNet which reformulates face anti-spoofing as a fine-grained patch-type recognition problem.
	Local-Adaptive Face Recognition via Graph-based Meta-Clustering and Regularized Adaptation Chien-Yi Wang, Wenbin Zhu, Kuan Lun Tseng, Shang-Hong Lai, Baoyuan Wang (=equal contribution) IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)*, 2022 arXiv / paper / supp / video We introduce a new problem setup called "local-adaptive face recognition (LaFR)" and proposed the clustering and adaptation modules to address face recognition in unseen environments.
	FedFR: Joint Optimization Federated Framework for Generic and Personalized Face Recognition Chih-Ting Liu, Chien-Yi Wang, Shao-Yi Chien, Shang-Hong Lai (=equal contribution) 36th AAAI Conference on Artificial Intelligence (AAAI), 2022 (Oral)* arXiv / code (coming soon) We propose a Federated Learning based framework called FedFR to improve the generic face representation in a privacy-aware manner. Besides, the framework jointly optimizes personalized models for the corresponding clients via the proposed Decoupled Feature Customization module.
	Disentangled Representation with Dual-stage Feature Learning for Face Anti-spoofing Yu-Chun Wang, Chien-Yi Wang, Shang-Hong Lai IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2022 arXiv / paper / supp We propose a novel dual-stage disentangled representation learning method that can efficiently untangle spoof-related features from irrelevant ones. Unlike previous FAS disentanglement works with one-stage architecture, we found that the dual-stage training design can improve the training stability and effectively encode the features to detect unseen attack types.
	High-Accuracy RGB-D Face Recognition via Segmentation-Aware Face Depth Estimation and Mask-Guided Attention Network Meng-Tzu Chiu, Hsun-Ying Cheng, Chien-Yi Wang, Shang-Hong Lai IEEE International Conference on Automatic Face and Gesture Recognition (FG), 2021 (Oral) arXiv / poster / video We propose to leverage pseudo facial segmentation and depth maps to assist the RGB-D face recognition task. With the multi-modal augmentation, the proposed mask-guided RGB-D recognition model could achieve superior performance on several benchmarks.
	Unified Representation Learning for Cross Model Compatibility Chien-Yi Wang, Ya-Liang Chang, Shang-Ta Yang, Dong Chen, Shang-Hong Lai British Machine Vision Conference (BMVC), 2020 arXiv / paper We propose a unified representation learning framework to address the Cross Model Compatibility (CMC) problem in the context of visual search applications. The method can be applied onto face recognition and person re-identification tasks.
	A 3D Dynamic Scene Analysis Framework for Development of Intelligent Transportation Systems Chien-Yi Wang, Athma Narayanan, Abhishek Patil, Wei Zhan, Yi-Ting Chen IEEE Intelligent Vehicles Symposium (IV), 2018 paper / video We propose a 3D dynamic scene analysis framework as the first step toward driving scene understanding. Specifically, given a sequence of synchronized 2D and 3D sensory data, the framework systematically integrates different perception modules to obtain 3D position, orientation, velocity and category of traffic participants and the ego car in a reconstructed 3D semantically labeled traffic scene.
	Robust Image Segmentation Using Contour-Guided Color Palette Xiang Fu, Chien-Yi Wang, Chen Chen, Changhu Wang, C.-C. Jay Kuo Proceedings of IEEE International Conference on Computer Vision (ICCV), 2015 paper / code The contour-guided color palette (CCP) is proposed for robust image segmentation. It efficiently integrates contour and color cues of an image.