← Back to Home

EAR (Embedding Audio-visual Retrieval)

A multimodal search engine that projects audio and images into a shared embedding space to enable cross-modal retrieval.

Start Query

Query Preview
Same Modality
Cross Modality