With the rapid growth of multimodal data on the web, cross-modal retrieval has become increasingly important for applications such as multimedia search and content recommendation. It aims to align ...