BridgeTower: Building Bridges Between Encoders in Vision-Language Representation Learning
Proceedings of the AAAI Conference on Artificial Intelligence, 37(9), 10637-10647.
Xiao Xu and Chenfei Wu and Shachar Rosenman and Vasudev Lal and Wanxiang Che and Nan Duan