Chien-Yi Wang
I am an Applied Scientist at Amazon Ring in Sunnyvale, CA. Previously, I worked as a Senior Research Scientist at NVIDIA Research and a Senior Research SDE at Microsoft AI R&D Center in Taiwan. I had 8+ years of experience specializing in computer vision research, deep learning-based model optimization, and machine learning service integration. My research focus is mainly on cross-modality representation learning, face modeling, and 2D/3D scene understanding. Interested in revolutionizing a machine learning system from the bottom‑up, devising better problem‑solving methods for challenging tasks, and learning new technologies and tools if the need arises.
Email /
Google Scholar /
LinkedIn /
Twitter /
GitHub
|
|
Research
I'm interested in computer vision and multi-modal representation learning.
|
|
BOFormer: Learning to Solve Multi-Objective Bayesian Optimization via Non-Markovian RL
Yu Heng Hung,
Kai-Jie Lin,
Yu-Heng Lin,
Chien-Yi Wang*,
Ping-Chun Hsieh*
  (*=equal advising)
International Conference on Machine Learning (ICML) AutoRL Workshop, 2024   (Spotlight)
OpenReview
we present a generalized deep Q-learning framework and propose BOFormer, which substantiates the framework for Multi-Objective Bayesian Optimization (MOBO) via sequence modeling.
|
|
DoRA: Weight-Decomposed Low-Rank Adaptation
Shih-Yang Liu,
Chien-Yi Wang,
Hongxu Yin,
Pavlo Molchanov,
Yu-Chiang Frank Wang,
Kwang-Ting Cheng,
Min-Hung Chen
International Conference on Machine Learning (ICML), 2024   (Oral)
Project
/
arXiv
/
Code
We presented DoRA, a new parameter-efficient fine-tuning approach, which consistently outperforms LoRA in fine-tuning LLM without incurring additional inference costs. These improvements are particularly notable for smaller ranks with 37.2% improvement over LoRA for rank 8 and 22.4% improvement for rank 4.
|
|
MCPNet: An Interpretable Classifier via Multi-Level Concept Prototypes
Bor-Shiun Wang,
Chien-Yi Wang*,
Wei-Chen Chiu*
  (*=equal advising)
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024  
Project
/
arXiv
/
Code
We propose the Multi-Level Concept Prototypes Classifier (MCPNet), an inherently interpretable model, that can explain a classifier's prediction via concept prototypes from low-to-high layers in the CNN models.
|
|
Probabilistic 3D Multi-Object Cooperative Tracking for Autonomous Driving via Differentiable Multi-Sensor Kalman Filter
Hsu-Kuang Chiu,
Chien-Yi Wang,
Min-Hung Chen,
Stephen F. Smith
IEEE International Conference on Robotics and Automation (ICRA), 2024  
Project
/
arXiv
/
Code
We propose a Differentiable Multi-Sensor Kalman Filter for 3D Multi-Object Cooperative Tracking (DMSTrack), which is designed to be capable of estimating observation noise covariance of each detection from different Connected Autonomous Vehicles (CAVs) to better take advantage of the Kalman Filter’s theoretical optimality property.
|
|
RAPPER: Reinforced Rationale-Prompted Paradigm for Natural Language Explanation in Visual Question Answering
Kai-Po Chang,
Chi-Ping Huang,
Wei-Yuan Cheng,
Fu-En Yang,
Chien-Yi Wang,
Yung-Hsuan Lai,
Yu-Chiang Frank Wang
International Conference on Learning Representations (ICLR), 2024  
OpenReview
We introduce Rapper, a two-stage Reinforced RationalePrompted Paradigm for Natural Language Explanation (NLE) in Visual Question Answering (VQA).
|
|
Efficient Model Personalization in Federated Learning via Client-Specific Prompt Generation
Fu-En Yang,
Chien-Yi Wang,
Yu-Chiang Frank Wang
IEEE International Conference on Computer Vision (ICCV), 2023  
arXiv
To leverage robust representations from large-scale models while enabling efficient model personalization for heterogeneous clients, we propose a novel personalized FL framework of client-specific Prompt Generation (pFedPG), which learns to deploy a personalized prompt generator at the server for producing client-specific visual prompts that efficiently adapts frozen backbones to local data distributions.
|
|
QuAVF: Quality-aware Audio-Visual Fusion for Ego4D Talking to Me Challenge
Hsi-Che Lin,
Chien-Yi Wang,
Min-Hung Chen,
Szu-Wei Fu,
Yu-Chiang Frank Wang
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshop, 2023  
1st Place Winner @ Ego4D Talking-To-Me (TTM) track
arXiv
We propose the Quality-aware audio-visual fusion (QuAVF) framework which achieves 67.4% mean average precision (mAP) on the Ego4D talking-to-me (TTM) test set.
|
|
A Closer Look at Geometric Temporal Dynamics for Face Anti-Spoofing
Chih-Jung Chang,
Yaw-Chern Lee,
Shih-Hsuan Yao,
Min-Hung Chen,
Chien-Yi Wang,
Shang-Hong Lai,
Trista Pei-Chun Chen
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshop, 2023  
Best Paper Award
paper
We propose Geometry-Aware Interaction Network (GAIN), which exploits dense facial landmarks with spatio-temporal graph convolutional network (ST-GCN) to establish a more interpretable and modularized Face Anti-Spoofing model.
|
|
RGB-D Face Recognition with Identity-Style Disentanglement and Depth Augmentation
Meng-Tzu Chiu,
Hsun-Ying Cheng,
Chien-Yi Wang,
Shang-Hong Lai
IEEE Transactions on Biometrics, Behavior, and Identity Science (TBIOM), 2023  
paper
We propose to augment facial segmentation and depth maps to assist the RGB-D face recognition task. With the multi-modal augmentation and identity-style disentanglement, the proposed RGB-D recognition model could achieve superior performance on several benchmarks.
|
|
MixFairFace: Towards Ultimate Fairness via MixFair Adapter in Face Recognition
Fu-En Wang,
Chien-Yi Wang,
Min Sun,
Shang-Hong Lai
37th AAAI Conference on Artificial Intelligence (AAAI), 2023  
arXiv
/
code
We propose the MixFair Adapter to determine and reduce the identity bias of training samples. Besides, in order to push for ultimate fairness in face recognition, we propose a new evaluation protocol to fairly evaluate the fairness performance of different approaches.
|
|
Generalized Face Anti-Spoofing via Multi-Task Learning and One-Side Meta Triplet Loss
Chu-Chun Chuang,
Chien-Yi Wang,
Shang-Hong Lai
IEEE International Conference on Automatic Face and Gesture Recognition (FG), 2023  
arXiv
We introduce a multi-task meta-learning framework for learning more generalized features for face anti-spoofing.
|
|
PatchNet: A Simple Face Anti-Spoofing Framework via Fine-Grained Patch Recognition
Chien-Yi Wang,
Yu-Ding Lu,
Shang-Ta Yang,
Shang-Hong Lai
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022  
arXiv
/
paper
/
supp
/
video
We propose PatchNet which reformulates face anti-spoofing as a fine-grained patch-type recognition problem.
|
|
Local-Adaptive Face Recognition via Graph-based Meta-Clustering and Regularized Adaptation
Chien-Yi Wang*,
Wenbin Zhu*,
Kuan Lun Tseng,
Shang-Hong Lai,
Baoyuan Wang
(*=equal contribution)
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022  
arXiv
/
paper
/
supp
/
video
We introduce a new problem setup called "local-adaptive face recognition (LaFR)" and proposed the clustering and adaptation modules to address face recognition in unseen environments.
|
|
FedFR: Joint Optimization Federated Framework for Generic and Personalized Face Recognition
Chih-Ting Liu*,
Chien-Yi Wang*,
Shao-Yi Chien,
Shang-Hong Lai
  (*=equal contribution)
36th AAAI Conference on Artificial Intelligence (AAAI), 2022   (Oral)
arXiv
/
code (coming soon)
We propose a Federated Learning based framework called FedFR to improve the generic face representation in a privacy-aware manner. Besides, the framework jointly optimizes personalized models for the corresponding clients via the proposed Decoupled Feature Customization module.
|
|
Disentangled Representation with Dual-stage Feature Learning for Face Anti-spoofing
Yu-Chun Wang,
Chien-Yi Wang,
Shang-Hong Lai
IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2022  
arXiv
/
paper
/
supp
We propose a novel dual-stage disentangled representation learning method that can efficiently untangle spoof-related features from irrelevant ones. Unlike previous FAS disentanglement works with one-stage architecture, we found that the dual-stage training design can improve the training stability and effectively encode the features to detect unseen attack types.
|
|
High-Accuracy RGB-D Face Recognition via Segmentation-Aware Face Depth Estimation and Mask-Guided Attention Network
Meng-Tzu Chiu,
Hsun-Ying Cheng,
Chien-Yi Wang,
Shang-Hong Lai
IEEE International Conference on Automatic Face and Gesture Recognition (FG), 2021   (Oral)
arXiv
/
poster
/
video
We propose to leverage pseudo facial segmentation and depth maps to assist the RGB-D face recognition task. With the multi-modal augmentation, the proposed mask-guided RGB-D recognition model could achieve superior performance on several benchmarks.
|
|
Unified Representation Learning for Cross Model Compatibility
Chien-Yi Wang,
Ya-Liang Chang,
Shang-Ta Yang,
Dong Chen,
Shang-Hong Lai
British Machine Vision Conference (BMVC), 2020  
arXiv
/
paper
We propose a unified representation learning framework to address the Cross Model Compatibility (CMC) problem in the context of visual search applications. The method can be applied onto face recognition and person re-identification tasks.
|
|
A 3D Dynamic Scene Analysis Framework for Development of Intelligent Transportation Systems
Chien-Yi Wang,
Athma Narayanan,
Abhishek Patil,
Wei Zhan,
Yi-Ting Chen
IEEE Intelligent Vehicles Symposium (IV), 2018  
paper
/
video
We propose a 3D dynamic scene analysis framework as the first step toward driving scene understanding. Specifically, given a sequence of synchronized 2D and 3D sensory data, the framework systematically integrates different perception modules to obtain 3D position, orientation, velocity and category of traffic participants and the ego car in a reconstructed 3D semantically labeled traffic scene.
|
|
Robust Image Segmentation Using Contour-Guided Color Palette
Xiang Fu,
Chien-Yi Wang,
Chen Chen,
Changhu Wang,
C.-C. Jay Kuo
Proceedings of IEEE International Conference on Computer Vision (ICCV), 2015  
paper
/
code
The contour-guided color palette (CCP) is proposed for robust image segmentation. It efficiently integrates contour and color cues of an image.
|
|