I am a research scientist at Apple AI/ML. My research lies in vision+language and text-guided visual editing. I obtained my Ph.D. from UCSB CS, advised by William Wang. My goal is to bridge the modality gap.
2024 - Now
Research Scientist @ Apple AI/ML
2019 - 2024
Research Assistant @ UCSB NLP
2018 - 2019
Research Assistant @ Academia Sinica CKIP
TC-Bench: Benchmarking Temporal Compositionality in Text-to-Video and Image-to-Video Generation
Weixi Feng, Jiachen Li, Michael Saxon, Tsu-Jui Fu, Wenhu Chen, and William Yang Wang
arXiv:2406.08656
Paper / Project / Code
From Text to Pixel: Advancing Long-Context Understanding in MLLMs
Yujie Lu, Xiujun Li, Tsu-Jui Fu, Miguel Eckstein, and William Yang Wang
arXiv:2405.14213
Paper
T2V-Turbo: Breaking the Quality Bottleneck of Video Consistency Model with Mixed Reward Feedback
Jiachen Li, Weixi Feng, Tsu-Jui Fu, Xinyi Wang, Sugato Basu, Wenhu Chen, and William Yang Wang
NeurIPS'24
Paper / Project / Code
Discffusion: Discriminative Diffusion Models as Few-shot Vision and Language Learners
Xuehai He, Weixi Feng, Tsu-Jui Fu, Varun Jampani, Arjun Akula, Pradyumna Narayana, Sugato Basu, William Yang Wang, and Xin Eric Wang
TMLR'24
Paper / Project / Code
Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models
Haotian Zhang*, Haoxuan You*, Philipp Dufter, Bowen Zhang, Chen Chen, Hong-You Chen, Tsu-Jui Fu, William Yang Wang, Shih-Fu Chang, Zhe Gan, and Yinfei Yang
COLM'24
Paper
Guiding Instruction-based Image Editing via Multimodal Large Language Models
Tsu-Jui Fu, Wenze Hu, Xianzhi Du, William Yang Wang, Yinfei Yang, and Zhe Gan
ICLR'24 (Spotlight)
Paper / Project / Slide / Video / Code
VELMA: Verbalization Embodiment of LLM Agents for Vision and Language Navigation in Street View
Raphael Schumann, Wanrong Zhu, Weixi Feng, Tsu-Jui Fu, Stefan Riezler, and William Yang Wang
AAAI'24
Paper / Project / Code
Text-guided 3D Human Generation from 2D Collections
Tsu-Jui Fu, Wenhan Xiong, Yixin Nie, Jingyu Liu, Barlas Oğuz, and William Yang Wang
EMNLP'23 (Findings)
Paper / Project / Slide / Video / Dataset
EDIS: Entity-Driven Image Search over Multimodal Web Content
Siqi Liu*, Weixi Feng*, Tsu-Jui Fu, Wenhu Chen, and William Yang Wang
EMNLP'23 (Long)
Paper / Code
Collaborative Generative AI: Integrating GPT-k for Efficient Editing in Text-to-Image Generation
Wanrong Zhu, Xinyi Wang, Yujie Lu, Tsu-Jui Fu, Xin Eric Wang, Miguel Eckstein, and William Yang Wang
EMNLP'23 (Short)
Paper
Photoswap: Personalized Subject Swapping in Images
Jing Gu, Yilin Wang, Nanxuan Zhao, Tsu-Jui Fu, Wei Xiong, Qing Liu, Zhifei Zhang, He Zhang, Jianming Zhang, Hyunjoon Jung, and Xin Eric Wang
NeurIPS'23
Paper / Project / Slide / Code
LayoutGPT: Compositional Visual Planning and Generation with Large Language Models
Weixi Feng*, Wanrong Zhu*, Tsu-Jui Fu, Varun Jampani, Arjun Akula, Xuehai He, Sugato Basu, Xin Eric Wang, and William Yang Wang
NeurIPS'23
Paper / Project / Code
Tell Me What Happened: Unifying Text-guided Video Completion via Multimodal Masked Video Generation
Tsu-Jui Fu, Licheng Yu, Ning Zhang, Cheng-Yang Fu, Jong-Chyi Su, William Yang Wang, and Sean Bell
CVPR'23
Paper / Project / Slide / Video / Code
An Empirical Study of End-to-End Video-Language Transformers with Masked Visual Modeling
Tsu-Jui Fu*, Linjie Li*, Zhe Gan, Kevin Lin, William Yang Wang, Lijuan Wang, and Zicheng Liu
CVPR'23
Paper / Slide / Video / Code
Training-Free Structured Diffusion Guidance for Compositional Text-to-Image Synthesis
Weixi Feng, Xuehai He, Tsu-Jui Fu, Varun Jampani, Arjun Akula, Pradyumna Narayana, Sugato Basu, Xin Eric Wang, and William Yang Wang
ICLR'23
Paper / Project / Code
ULN: Towards Underspecified Vision-and-Language Navigation
Weixi Feng, Tsu-Jui Fu, Yujie Lu, and William Yang Wang
EMNLP'22 (Long)
Paper / Slide / Video / Code
CPL: Counterfactual Prompt Learning for Vision and Language Models
Xuehai He, Diji Yang, Weixi Feng, Tsu-Jui Fu, Arjun Akula, Varun Jampani, Pradyumna Narayana, Sugato Basu, William Yang Wang, and Xin Eric Wang
EMNLP'22 (Long)
Paper / Video / Code
Language-Driven Artistic Style Transfer
Tsu-Jui Fu, Xin Eric Wang, and William Yang Wang
ECCV'22
Paper / Project / Slide / Video / Code
M3L: Language-based Video Editing via Multi-Modal Multi-Level Transformer
Tsu-Jui Fu, Xin Eric Wang, Scott Grafton, Miguel Eckstein, and William Yang Wang
CVPR'22
Paper / Slide / Video / Dataset
DOC2PPT: Automatic Presentation Slides Generation from Scientific Documents
Tsu-Jui Fu, William Yang Wang, Daniel McDuff, and Yale Song
AAAI'22
Paper / Project / Slide / Video / Code
VIOLET: End-to-End Video-Language Transformers with Masked Visual-token Modeling
Tsu-Jui Fu, Linjie Li, Zhe Gan, Kevin Lin, William Yang Wang, Lijuan Wang, and Zicheng Liu
arXiv:2111.12681
Paper / Code
H-FND: Hierarchical False-Negative Denoising for Distant Supervision Relation Extraction
Jhih-Wei Chen*, Tsu-Jui Fu*, Chen-Kang Lee, and Wei-Yun Ma
ACL'21 (Findings)
Paper / Slide / Video / Code
Semi-Supervised Policy Initialization for Playing Games with Language Hints
Tsu-Jui Fu and William Yang Wang
NAACL'21 (Short)
Paper / Slide / Video / Code
L2C: Describing Visual Differences Needs Semantic Understanding of Individuals
An Yan, Xin Eric Wang, Tsu-Jui Fu, and William Yang Wang
EACL'21 (Short)
Paper / Slide
Multimodal Style Transfer Learning for Outdoor Vision-and-Language Navigation
Wanrong Zhu, Xin Eric Wang, Tsu-Jui Fu, An Yan, Pradyumna Narayana, Kazoo Sone, Sugato Basu, and William Yang Wang
EACL'21 (Long)
Paper / Slide / Code
SSCR: Iterative Language-Based Image Editing via Self-Supervised Counterfactual Reasoning
Tsu-Jui Fu, Xin Eric Wang, Scott Grafton, Miguel Eckstein, and William Yang Wang
EMNLP'20 (Oral)
Paper / Slide / Code
Counterfactual Vision-and-Language Navigation via Adversarial Path Sampler
Tsu-Jui Fu, Xin Eric Wang, Matthew Peterson, Scott Grafton, Miguel Eckstein, and William Yang Wang
ECCV'20 (Spotlight)
Paper / Slide / Video / Model
Why Attention? Analyzing and Remedying BiLSTM Deficiency in Modeling Cross-Context for NER
Peng-Hsuan Li, Tsu-Jui Fu, and Wei-Yun Ma
AAAI'20 (Oral)
Paper / Code
Learning from Observation-Only Demonstration for Task-Oriented Language Grounding via Self-Examination
Tsu-Jui Fu, Yuta Tsuboi, Sosuke Kobayashi, and Yuta Kikuchi
NeurIPSW'19 (ViGIL workshop)
Paper
A Distributed Scheme for Accelerating Semantic Video Segmentation on An Embedded Cluster
Hsuan-Kung Yang*, Tsu-Jui Fu*, Kuan-Wei Ho, Po-Han Chiang, and Chun-Yi Lee
ICCD'19 (Oral)
Paper / Video
Adversarial Active Exploration for Inverse Dynamics Model Learning
Zhang-Wei Hong, Tsu-Jui Fu, Tzu-Yun Shann, Yi-Hsiang Chang, and Chun-Yi Lee
CoRL'19 (Oral)
Paper
GraphRel: Modeling Text as Relational Graphs for Joint Entity and Relation Extraction
Tsu-Jui Fu, Peng-Hsuan Li, and Wei-Yun Ma
ACL'19 (Long)
Paper / Slide / Code
Attentive and Adversarial Learning for Video Summarization
Tsu-Jui Fu, Shao-Heng Tai, and Hwann-Tzong Chen
WACV'19 (Oral)
Paper / Video / Code
Region-Semantics Preserving Image Synthesis
Kang-Jun Liu, Tsu-Jui Fu, and Shan-Hung Wu
ACCV'18
Paper / Video / Code
Diversity-Driven Exploration Strategy for Deep Reinforcement Learning
Zhang-Wei Hong, Tzu-Yun Shann, Shih-Yang Su, Yi-Hsiang Chang, Tsu-Jui Fu, and Chun-Yi Lee
NeurIPS'18
Paper / Video
Speed Reading: Learning to Read ForBackward via Shuttle
Tsu-Jui Fu and Wei-Yun Ma
EMNLP'18 (Long)
Paper / Code
Visual Relationship Prediction via Label Clustering and Incorporation of Depth Information
Hsuan-Kung Yang, An-Chieh Cheng*, Kuan-Wei Ho*, Tsu-Jui Fu, and Chun-Yi Lee
ECCVW'18 (PIC workshop)
Paper
Dynamic Video Segmentation Network
Yu-Syuan Xu, Tsu-Jui Fu*, Hsuan-Kung Yang*, and Chun-Yi Lee
CVPR'18
Paper / Video / Code
Summer 2023
Research Intern @ Apple AI/ML
Advisor: Zhe Gan and Yinfei Yang
Summer 2022
Research Intern @ Meta AI
Advisor: Licheng Yu and Sean Bell
Summer 2021
Research Intern @ Microsoft Azure AI
Advisor: Linjie Li, Zhe Gan, and Lijuan Wang
Summer 2020
Research Intern @ Microsoft Research
Advisor: Yale Song and Daniel McDuff
Summer 2019
Research Intern @ Preferred Networks
Advisor: Yuta Tsuboi and Jason Naradowsky