A multimodal search engine that projects audio and images into a shared embedding space to enable cross-modal retrieval.