ManagerTower: Aggregating the Insights of Uni-Modal Experts for Vision-Language Representation Learning
ArXiv preprint, abs/2306.00103, 2023.
Xiao Xu and Bei Li and Chenfei Wu and Shao-Yen Tseng and Anahita Bhiwandiwalla and Shachar Rosenman and Vasudev Lal and Wanxiang Che and Nan Duan