Data Science Dojo

Publication

0 followers

Educational AI data science and machine learning tutorials and talks

Search Systems & Why Keyword Search Falls Short | Vector Databases for Beginners | Part 11

The video introduces vector search as a modern alternative to traditional keyword‑based search systems, explaining why the latter often fails to capture user intent. It outlines how keyword search relies on exact token matching, requiring precise terms and extensive synonym lists, which struggle with conversational language, typos, and semantic nuance. Key insights include the limitation of token‑based indexing, the manual effort needed to maintain synonym dictionaries, and the emergence of embedding‑based retrieval that measures similarity in a high‑dimensional space. The presenter demonstrates the workflow: documents are transformed into vectors, stored in a specialized vector database, and queried via nearest‑neighbor algorithms that return results based on meaning rather than literal word overlap. Examples cited range from a failed "Alaskan fish" query that misses relevant items to e‑commerce scenarios where shoppers use natural language descriptions. The speaker references a November blog post and a Python code demo that illustrate building a vector search engine from scratch, emphasizing the practical steps needed to transition from token tables to embedding stores. The shift to vector search has significant business implications: it promises more accurate results for internal knowledge bases, customer‑facing product searches, and developer tooling, reducing friction and boosting conversion rates. Companies adopting vector databases can automate semantic matching, cut maintenance overhead, and stay competitive as user expectations evolve toward conversational interfaces.

By Data Science Dojo

Video•Feb 17, 2026

How Fortune 100 Companies Adopt Knowledge Graphs in Practice | Emil Eifrem X Data Science Dojo

The discussion centers on how Fortune 100 enterprises are actually implementing knowledge graphs, contrasting idealized, organization‑wide visions with the pragmatic routes companies are taking today. Two adoption patterns emerge. Large firms often build an “enterprise knowledge graph” that mirrors portions of their...

By Data Science Dojo

Video•Feb 12, 2026

Scaling AI Beyond Single Agents: Multi-Agent Architectures with LangChain

The webinar hosted by Nabeha and Isma discussed scaling AI beyond single agents, focusing on multi‑agent architectures using LangChain. It outlined fundamentals of AI agents—LLM brain, tools, memory—and why monolithic agents struggle as tasks grow. The presenters highlighted token‑bloat, context‑window exhaustion,...

By Data Science Dojo

Video•Feb 9, 2026

Beyond Diffusion: Flow Matching for Generative AI

Yuri Zilai’s webinar introduced flow matching as a next‑generation alternative to diffusion‑based generative AI. He outlined the agenda—reviewing fundamental generative models, dissecting diffusion, explaining flow‑matching mechanics, showcasing real‑world deployments, and a live 2‑D notebook demo. All generative models map Gaussian noise...

By Data Science Dojo

Video•Jan 7, 2026

Creating & Ingesting Your Own Embeddings in Weaviate | Vector Databases for Beginners | Part 7

The video walks viewers through building custom text embeddings with a SentenceTransformers model from HuggingFace and loading them into a Weaviate vector database. The presenter demonstrates the workflow in a Google Colab notebook, pulling a subset of 100 arXiv paper...

By Data Science Dojo

Video•Jan 5, 2026

How To Explain A Concept Without Dumbing It Down | Joshua Starmer X Data Science Dojo

The video features Joshua Starmer discussing how to explain complex data‑science concepts without "dumbing them down." He emphasizes a constant self‑check: can the idea be presented more simply while staying true to the original algorithm and its intent? This mindset...

By Data Science Dojo

Video•Dec 23, 2025

A2A Protocol Workshop: Build Interoperable Multi-Agent Systems

In a Data Science Dojo webinar, Zaid Ahmed led a workshop on the Agent-to-Agent (A2A) protocol, positioning it alongside Model Context Protocol (MCP) as a solution for building interoperable multi-agent systems. He recapped MCP’s role in wrapping APIs for LLM...

By Data Science Dojo

Video•Dec 20, 2025

Exploring the MTEB Leaderboard | Vector Databases for Beginners | Part 6

The video walks viewers through the MTEB (Massive Text Embedding Benchmark) leaderboard, positioning it as a practical guide for selecting open‑source embedding models and tuning modules for vector‑search applications. The presenter highlights recent UI changes—new benchmarks, language options, and domain‑specific...

By Data Science Dojo

Video•Dec 19, 2025

AI Still Hallunicates Can We Trust It, And To What Extent | Joshua Starmer X Data Science

The video centers on the persistent problem of AI hallucinations—instances where large language models generate plausible‑but‑incorrect information—and asks how much trust users can place in these systems. Joshua Starmer, speaking alongside Data Science, argues that while the technology will improve,...

By Data Science Dojo

Video•Dec 19, 2025

Choosing the Right Embedding Model | Vector Databases for Beginners | Part 5

The video walks viewers through the decision‑making process for selecting an embedding model, a critical component in building vector‑database‑driven applications. It contrasts two concrete examples—a modern open‑source BERT‑base model and a proprietary OpenAI offering—while acknowledging the overwhelming variety of alternatives...

By Data Science Dojo

Video•Dec 17, 2025

From Word2Vec to Transformers | Vector Databases for Beginners | Part 4

The video “From Word2Vec to Transformers | Vector Databases for Beginners | Part 4” walks viewers through the historical shift from static, word‑level embeddings to context‑aware transformer‑based models. It opens by recapping the shortcomings of early techniques like Word2Vec—namely their...

By Data Science Dojo

Video•Dec 15, 2025

Why Josh Always Asks, “Can A Topic Be Any Simpler Than This?” | Joshua Starmer X Data Science Dojo

In a candid conversation with Data Science Dojo, Joshua Starmer explains the guiding principle behind his instructional videos: constantly asking, “Can a topic be any simpler without dumbing it down?” He frames this question as a litmus test for clarity,...

By Data Science Dojo

Technology Pulse

Data Science Dojo

Recent Posts

Search Systems & Why Keyword Search Falls Short | Vector Databases for Beginners | Part 11

How Fortune 100 Companies Adopt Knowledge Graphs in Practice | Emil Eifrem X Data Science Dojo

Scaling AI Beyond Single Agents: Multi-Agent Architectures with LangChain

Beyond Diffusion: Flow Matching for Generative AI

Creating & Ingesting Your Own Embeddings in Weaviate | Vector Databases for Beginners | Part 7

How To Explain A Concept Without Dumbing It Down | Joshua Starmer X Data Science Dojo

A2A Protocol Workshop: Build Interoperable Multi-Agent Systems

Exploring the MTEB Leaderboard | Vector Databases for Beginners | Part 6

AI Still Hallunicates Can We Trust It, And To What Extent | Joshua Starmer X Data Science

Choosing the Right Embedding Model | Vector Databases for Beginners | Part 5

From Word2Vec to Transformers | Vector Databases for Beginners | Part 4

Why Josh Always Asks, “Can A Topic Be Any Simpler Than This?” | Joshua Starmer X Data Science Dojo

Technology Pulse

Data Science Dojo

Recent Posts

Search Systems & Why Keyword Search Falls Short | Vector Databases for Beginners | Part 11

How Fortune 100 Companies Adopt Knowledge Graphs in Practice | Emil Eifrem X Data Science Dojo

Scaling AI Beyond Single Agents: Multi-Agent Architectures with LangChain

Beyond Diffusion: Flow Matching for Generative AI

Creating & Ingesting Your Own Embeddings in Weaviate | Vector Databases for Beginners | Part 7

How To Explain A Concept Without Dumbing It Down | Joshua Starmer X Data Science Dojo

A2A Protocol Workshop: Build Interoperable Multi-Agent Systems

Exploring the MTEB Leaderboard | Vector Databases for Beginners | Part 6

AI Still Hallunicates Can We Trust It, And To What Extent | Joshua Starmer X Data Science

Choosing the Right Embedding Model | Vector Databases for Beginners | Part 5

From Word2Vec to Transformers | Vector Databases for Beginners | Part 4

Why Josh Always Asks, “Can A Topic Be Any Simpler Than This?” | Joshua Starmer X Data Science Dojo