Vector Databases: Embeddings, Semantic Search, and Hybrid Retrieval - Alexey Grigorev

DataTalks.Club
DataTalks.ClubMay 18, 2026

Why It Matters

Semantic retrieval boosts chatbot accuracy, leading to faster customer support and higher satisfaction, but requires careful trade‑offs between performance and operational complexity.

Key Takeaways

  • Start with lexical BM25 search before adopting vector embeddings.
  • Vector search captures semantic similarity, handling varied query phrasing.
  • Use SentenceTransformers and PyTorch to generate document embeddings.
  • Deploy a lightweight vector DB after indexing FAQ documents.
  • Hybrid retrieval combines text and vector results for optimal answers.

Summary

The session walks through building a FAQ chatbot for the LLM Zoom Camp, focusing on vector databases, embeddings, semantic search, and hybrid retrieval. It serves as a standalone workshop within a larger course on real‑world LLM applications. Key insights include the contrast between traditional lexical BM25 search and modern vector search, the operational overhead of embedding generation, and the recommendation to begin with text search before moving to semantic methods. Participants install heavy dependencies like PyTorch and SentenceTransformers to turn FAQ entries into dense vectors. A vivid example compares two user queries—“I just discovered the course, can I still join?” and “I just found out about the program, can I still enroll?”—showing how vector embeddings bridge lexical gaps. The instructor demonstrates word‑level embeddings, then scales to sentence embeddings, and highlights the sizable download of transformer libraries. Adopting vector search can dramatically improve answer relevance for support bots, yet it introduces infrastructure complexity. A hybrid approach—merging BM25 results with semantic matches—offers a pragmatic balance, enabling businesses to enhance self‑service while managing resource costs.

Original Description

Links:
Connect with DataTalks.Club:
- Join the community - https://datatalks.club/slack.html
- Subscribe to our Google calendar to have all our events in your calendar - https://calendar.google.com/calendar/r?cid=ZjhxaWRqbnEwamhzY3A4ODA5azFlZ2hzNjBAZ3JvdXAuY2FsZW5kYXIuZ29vZ2xlLmNvbQ
- Check other upcoming events - https://lu.ma/dtc-events
Connect with Alexey
Check our free online courses:
- ML Engineering course - http://mlzoomcamp.com
👋🏼 Support/inquiries
If you want to support our community, use this link - https://github.com/sponsors/alexeygrigorev
If you’re a company, reach us at alexey@datatalks.club

Comments

Want to join the conversation?

Loading comments...