How image search works at Dropbox

# · 🔥 185 · 💬 103 · 2 years ago · dropbox.tech · hyzyla · 📷

Wouldn't it be great if Dropbox could pore through all those images for you instead, and call out those which best match a few descriptive words that you dictated? That's pretty much what our image search does. In this post we'll describe the core idea behind our image content search method, based on techniques from machine learning, then discuss how we built a performant implementation on Dropbox's existing search infrastructure. Here's a simple way to state the image search problem: find a relevance function that takes a query q and an image j, and returns a relevance score s indicating how well the image matches the query. An image classifier reads an image and outputs a scored list of categories that describe its contents. If we can extract a meaningful representation of the query in this space, we can interpret how close the image vector is to the query vector as a measure of how well the image matches the query. This lets us calculate qc = , a vector in the C-dimensional category space which represents how well the query matches each category, just as the image classifier vector for each image represents how well the image matches each category. We combine image content search for general images, OCR-based search for images of documents, and full-text search for text documents to make most of these users' files available through content-based search.