From Zero to Billion Row Analytics with Exasol Personal

DataTalks.Club
DataTalks.ClubMar 13, 2026

Why It Matters

By showcasing a free, scalable pipeline for billion‑row health data, the tutorial proves that organizations can accelerate analytics projects without costly infrastructure, driving faster insights and cost savings.

Key Takeaways

  • Exasol Personal enables free, in‑memory analytics for billion‑row datasets.
  • Setup requires AWS account, Exasol Launcher, and Terraform configuration.
  • Data ingestion uses staging schema, then transforms to production schema.
  • Automation orchestrated via Python scripts and Kestra workflow engine.
  • Dashboard built after loading, trimming, and querying prescription data.

Summary

The video walks through a hands‑on workshop where the presenter builds a data pipeline capable of processing one billion prescription records from the UK’s GP prescribing dataset, using Exasol Personal, a free version of the in‑memory columnar analytical database.

He explains the dataset’s scale—about 10 million rows per month—and outlines the two‑phase ingestion strategy: a staging area for raw CSV files (address, chemicals, prescriptions) and a production schema for analyst consumption. The setup involves an AWS account, the Exasol Launcher CLI, and Terraform‑managed infrastructure, followed by connecting via DBeaver or Python.

Notable details include using curl to fetch a 10 KB preview of the 1 GB prescription file, handling CSV files with CRLF line endings and padding, and trimming whitespace through simple SQL before loading into final tables. The workflow is automated with Python scripts and the Kestra orchestrator, culminating in a dashboard visualizing prescribing trends.

This demonstration shows that even large‑scale health‑care analytics can be prototyped on a free personal data‑warehouse tier, lowering barriers for data scientists and encouraging broader adoption of modern, cloud‑native analytics stacks.

Original Description

Connect with DataTalks.Club:
- Join the community - https://datatalks.club/slack.html
- Subscribe to our Google calendar to have all our events in your calendar - https://calendar.google.com/calendar/r?cid=ZjhxaWRqbnEwamhzY3A4ODA5azFlZ2hzNjBAZ3JvdXAuY2FsZW5kYXIuZ29vZ2xlLmNvbQ
- Check other upcoming events - https://lu.ma/dtc-events
Connect with Alexey
Check our free online courses:
- ML Engineering course - http://mlzoomcamp.com
👋🏼 Support/inquiries
If you want to support our community, use this link - https://github.com/sponsors/alexeygrigorev
If you’re a company, reach us at alexey@datatalks.club

Comments

Want to join the conversation?

Loading comments...