RAG Project: Company Policy Bot

Analytics Vidhya
Analytics VidhyaMar 20, 2026

Why It Matters

A reliable, citation‑backed policy assistant reduces HR query load and mitigates compliance risk, turning static documents into actionable, searchable knowledge.

Key Takeaways

  • Hybrid retrieval combines dense embeddings with BM25 keyword search.
  • PDF split into overlapping 700‑character chunks improves semantic matching.
  • LangChain orchestrates retrieval, formatting, and OpenAI LLM response.
  • Evaluation uses faithfulness, answer relevancy, and contextual relevancy metrics.
  • Pipeline delivers grounded HR answers with source page citations.

Summary

The notebook demonstrates how to build a Retrieval‑Augmented Generation (RAG) pipeline that turns a static HR policy PDF into an interactive assistant. By loading the document, splitting it into overlapping 700‑character chunks, and creating OpenAI embeddings, the workflow stores vector representations in a ChromaDB dense store and adds a BM25 keyword retriever for exact term matching.

The core of the system is a hybrid retriever that merges dense semantic search with BM25 results, feeding the top four chunks into a LangChain RAG chain. A custom prompt instructs GPT‑4.1‑mini to answer strictly from the supplied context and cite the source page, ensuring responses are both accurate and traceable. Sample queries—such as how to report sexual harassment—return concise answers with correct page references.

Evaluation leverages the DPEval framework, measuring faithfulness, answer relevancy, and contextual relevancy. While answer relevancy scores reached 100%, faithfulness and contextual relevancy hovered around 66%, highlighting gaps where retrieved passages were irrelevant or partially misinterpreted. The notebook details these metrics and suggests further tuning of the hybrid weighting.

The pipeline offers a scalable, low‑latency solution for internal knowledge bases, allowing employees to obtain policy‑compliant answers without manual document searches. Improving faithfulness will be critical for compliance‑sensitive environments, but the demonstrated architecture provides a solid foundation for enterprise‑wide AI assistants.

Original Description

Enroll on the link given below to download the IPYNB file:

Comments

Want to join the conversation?

Loading comments...