bookmark_border

Similarity Search with Jeff Johnson

Software Engineering Daily,

Originally posted on Software Engineering Daily

Querying a search index for objects similar to a given object is a common problem. A user who has just read a great news article might want to read articles similar to it. A user who has just taken a picture of a dog might want to search for dog photos similar to it. In both of these cases, the query object is turned into a vector and compared to the vectors representing the objects in the search index.

Facebook contains a lot of news articles and a lot of dog pictures. How do you index and query all that information efficiently? Much of that data is unlabeled. How can you use deep learning to classify entities and add more richness to the vectors?

Jeff Johnson is a research engineer at Facebook. He joins the show to discuss how similarity search works at scale, including how to represent that data and the tradeoffs of this kind of search engine across speed, memory usage, and accuracy.

Sponsors


Spring Framework gives developers an environment for building cloud native projects. On December 4th-7th, SpringOne Platform is coming to San Francisco. SpringOne Platform is a conference where developers congregate to explore the latest technologies in the Spring ecosystem and beyond. Speakers at SpringOne Platform include Eric Brewer (who created the CAP theorem), Vaughn Vernon (who writes extensively about Domain Driven Design), and many thought leaders in the Spring Ecosystem. SpringOne Platform is the premier conference for those who build, deploy, and run cloud-native software. Software Engineering Daily listeners can sign up with the discount code SEDaily100 and receive $100 off of a Spring One Platform conference pass. I will also be at SpringOne reporting on developments in the cloud native ecosystem. Join me December 4th-7th at the SpringOne Platform conference, and use discount code SEDaily100 for $100 off your conference pass.


GrammaTech CodeSonar helps development teams improve code quality with static analysis. It helps flag issues early in the development process, allowing developers to release better code faster. CodeSonar can easily be integrated into any development process. CodeSonar performs advanced static analysis of C, C++, Java, and even raw binary code. CodeSonar performs unique dataflow and symbolic execution analysis to aggressively scan for problems in your code. Just like battleships use sonar to detect objects deep underwater, engineers use CodeSonar to detect subtle problems deep within their code. Go to go.grammatech.com/sedaily to get your free 30-day trial, exclusively for Software Engineering Daily listeners and unleash the power of advanced static analysis.


Bugsnag improves the task of troubleshooting errors by making it more enjoyable and less time-consuming. For example, when an error occurs, your team can get notified via Slack, see diagnostic information on the error, and identify the developer who committed the code. Bugsnag’s integration with Jira and other collaboration tools makes it easy to assign and track bugs as they are being fixed. There is a special offer for Software Engineering Daily listeners. Try all features free for 60 days at https://www.bugsnag.com/sedaily. Development teams can now iterate faster and improve software quality. To get started, go to https://www.bugsnag.com/sedaily/. Get up and running in three minutes. Airbnb, Lyft, and Shopify all use Bugsnag to monitor application errors.  


Thanks to Symphono for sponsoring Software Engineering Daily. Symphono is a custom engineering shop where senior engineers tackle big tech challenges while learning from each other. Check it out at symphono.com/sedaily. Thanks to Symphono for being a sponsor of Software Engineering Daily for almost a year now. Your continued support allows us to deliver content to the listeners on a regular basis.


facebook

About the Podcast