Running LLMs Locally Just Got Way Better - Ollama + MCP
Why It Matters
Local, tool‑enabled LLMs give enterprises privacy and cost control while unlocking the same automation capabilities previously limited to cloud services.
Key Takeaways
- •Install Ollama, then pull a tool‑calling LLM like Gwendolyn 3.5.
- •Verify your GPU/Unified memory can support model size before downloading.
- •Use Zapier MCP server to expose 8,000+ integrations to the local model.
- •Select models with tool‑calling capability; older LLMs lack actionable functions.
- •Bridge Ollama with MXGP client to enable real‑time tool calls securely.
Summary
The video walks viewers through setting up a private, locally‑run large language model using Ollama and connecting it to external services via the Zapier MCP server. It emphasizes that the combination delivers cloud‑level functionality—such as accessing Google, Notion, or Facebook Ads—without exposing data to third‑party APIs.
Key technical points include checking GPU or unified memory capacity, choosing a model that supports tool‑calling, and balancing parameter count against available RAM. The presenter demonstrates downloading the Gwendolyn 3.5 model (35 billion parameters) and explains why older LLMs without tool‑calling are unsuitable for automation tasks.
A practical example shows the MXGP client acting as a bridge between Ollama and Zapier’s MCP server, enabling real‑time calls to over 8,000 integrations. The speaker notes that each Zap counts as a credit on a Zapier plan, but the free tier typically suffices for modest usage, and highlights the speed differences when running 27‑billion versus 35‑billion‑parameter models on an M2 Max Mac.
The overall implication is that businesses and developers can now run powerful, privacy‑preserving AI agents on commodity hardware, cutting cloud costs while retaining the ability to automate workflows across a vast ecosystem of tools.
Comments
Want to join the conversation?
Loading comments...