Video•Feb 10, 2026
Building Scalable GenAI Inference Pipelines with Spark NLP with David Talby
David Talby of Pacific AI showcases Spark NLP, an Apache‑2.0 open‑source library that enables enterprise‑grade natural language processing at petabyte scale on standard Spark clusters. He highlights three core use cases: generating embeddings for retrieval‑augmented generation vector stores, performing batch inference for tasks such as summarization and translation without costly LLM APIs, and executing multimodal information extraction that incorporates text, images, and speech. The discussion emphasizes how Spark NLP bridges the gap between massive data volumes and affordable GenAI inference.