What Running AI Locally Means (And Why We Don't Use the Cloud)
Our AI runs on a Mac Studio in the office — no data leaves the building. Here's why we chose local over cloud, and what it actually costs.
When most people hear "AI," they think of ChatGPT, Google Gemini, or some other service you access through a browser. You type something in, it goes to a server farm somewhere in the United States, gets processed, and the answer comes back. That's cloud AI. It's what most businesses use, and for a lot of use cases, it works fine.
But there's another way to do it. You can run AI models on a computer you own, sitting in your office, plugged into your wall. No data leaves the building. No monthly API bills. No dependency on someone else's servers. That's local AI — and it's what we use to power everything we've built for our dental practice.
What "local AI" actually means
Let's strip away the jargon. An AI model is just a very large file — think of it as a program that's been trained to do a specific thing, like transcribe speech or summarise a document. Normally, that file lives on a server owned by OpenAI, Google, or Microsoft, and you pay them every time you use it.
With local AI, that file lives on your computer. When you ask it to do something, the work happens right there in your office. The data never touches the internet. There's no API call, no cloud server, no third party involved at all.
It's the difference between renting a car every time you need to drive somewhere and owning one outright. The rental might be newer and fancier, but the one in your garage is always available, costs nothing to use, and nobody else gets to see where you're going.
Our setup: one machine, four AI services
We run all of our AI on an Apple Mac Studio with the M4 Max chip. It's about the size of a lunch box, sits quietly on a shelf in the office, and draws less power than a desk lamp. On that single machine, we run four separate AI services:
- Speech-to-text (Whisper) — turns voice recordings into written text. This powers our dictation app, call transcriptions, and voice notes. A clinician speaks, and the words appear on screen in seconds.
- Text-to-speech — turns written text into natural-sounding voice. This is what our AI phone receptionist uses to talk to callers. It doesn't sound robotic — it sounds like a real person.
- Large Language Model (LLM) — the "brain." This is the part that reads emails and decides how to classify them, summarises phone calls, drafts responses, extracts information from documents, and makes decisions about what to do with incoming data.
- Embeddings — turns text into searchable vectors. This is a bit more technical, but in plain terms: it lets our AI memory system find related information even when the exact words don't match. Search for "crown prep" and it'll find notes about "porcelain restoration" because it understands they're the same thing.
Every AI tool we've built — the smart inbox, the voice dictation, the document scanner, the daily briefings — runs through one or more of these four services. And all four run on that one machine in the office.
Why we chose local over cloud
We didn't do this to be clever. We did it because the alternatives had real downsides for our business. Here's what tipped the scales:
Privacy: your data stays in the building
We're a dental practice. We handle patient health records, medical histories, financial information, Medicare details. Every time you send that data to a cloud AI service, you're trusting that provider to handle it properly — their security, their data retention policies, their compliance with Australian privacy law.
With local AI, the question doesn't arise. Patient data never leaves our network. There's nothing to breach because there's nothing in the cloud. For healthcare, legal, financial, or any business handling sensitive information, this is a significant advantage.
Cost: no per-request charges
Cloud AI bills you every time you use it. OpenAI charges per token (roughly per word). Google charges per API call. The pricing looks cheap until you scale up. Transcribing every phone call, classifying every email, processing every document — the requests add up fast.
Local AI has a one-time hardware cost and then runs essentially for free. We don't pay per transcription, per email classified, or per document processed. Whether we run ten requests a day or ten thousand, the cost is the same: zero. We go into the full cost breakdown in another post, but the short version is dramatic.
Speed: no internet round-trip
When you use cloud AI, your request has to travel to a server (usually in the US), get processed, and travel back. That's 200–500 milliseconds of latency before the AI even starts thinking. For a single request, you barely notice. For a voice agent having a real-time conversation, or a dictation tool that needs to feel instant, that lag is the difference between "this feels natural" and "this feels broken."
Our local AI responds in milliseconds. The speech-to-text transcription feels instantaneous. The phone receptionist doesn't have awkward pauses. Everything just feels faster because it is faster.
Reliability: works when the internet doesn't
If your internet goes down, cloud AI stops working. Full stop. Your transcription tool, your email classifier, your phone agent — all dead until the connection comes back.
Our AI doesn't care about the internet. The Mac Studio keeps running regardless. In Darwin, where tropical storms can knock out connectivity, this isn't a theoretical advantage — it's a practical one.
No vendor lock-in: you own everything
We own the hardware. We own the models (they're open-source). If Apple discontinues the Mac Studio tomorrow, the models run on any other machine with enough grunt. If one speech-to-text model gets outperformed by a newer one, we swap it out in an afternoon. No migration, no vendor negotiation, no "we're sunsetting this feature" email.
"But isn't cloud AI better?"
Sometimes, yes. Let's be honest about this.
For cutting-edge reasoning — the kind of thing GPT-4 and Claude do when you ask them to analyse a complex contract, write nuanced code, or think through a multi-step problem — cloud models are still ahead. The frontier models from OpenAI and Anthropic are trained on enormous clusters that no office machine can replicate.
But here's the thing: most business automation doesn't need frontier-level reasoning. Transcribing a phone call? A local Whisper model does that brilliantly. Classifying an email as "patient enquiry" vs "supplier invoice" vs "junk"? A local LLM handles that with near-perfect accuracy. Summarising a document, extracting a name and date from a form, generating a natural-sounding voice response? All well within the capabilities of models that run on a single machine.
We use a hybrid approach. Local AI handles everything it can — which is about 90% of our daily workload. For the rare tasks that genuinely need frontier-level intelligence, we use cloud AI selectively. This gives us the best of both worlds: privacy and cost savings for the routine work, and access to the most capable models when we actually need them.
The cost comparison
Let's put real numbers on this.
| Local AI | Cloud AI | |
|---|---|---|
| Upfront cost | $5,000–8,000 (hardware) | $0 |
| Monthly cost | ~$0 (electricity negligible) | $500–2,000+ (API calls) |
| Cost after 12 months | $5,000–8,000 total | $6,000–24,000 total |
| Cost after 24 months | $5,000–8,000 total | $12,000–48,000 total |
| Data privacy | Data never leaves your office | Data sent to third-party servers |
| Internet required? | No | Yes |
| Vendor lock-in | None — you own everything | High — tied to provider's pricing and policies |
The break-even point is somewhere between 4 and 12 months, depending on how heavily you use AI. For a business like ours — running transcription, classification, and voice synthesis hundreds of times a day — the hardware paid for itself in under four months. After that, every month is pure savings.
And unlike cloud subscriptions, the hardware doesn't get more expensive over time. There's no "we've updated our pricing" email. No usage tiers. No surprise bills because you processed more documents than usual last month.
What you need to get started
You don't need a server room or an IT department. Our entire AI infrastructure is one machine on a shelf. The key requirements are:
- The right hardware. Apple Silicon (M2 Pro and above) or a decent NVIDIA GPU. The Mac Studio with M4 Max is the sweet spot for us — powerful enough to run multiple models simultaneously, quiet enough to sit in a treatment room, and energy-efficient enough that the power bill is negligible.
- The right models. Open-source models like Whisper (speech-to-text), Llama (language model), and various TTS models are free to download and run. No licence fees, no API keys, no terms of service to worry about.
- Someone to set it up. This is the honest part — setting up local AI isn't plug-and-play yet. The models need to be configured, optimised for your hardware, and integrated with your business tools. That's what we do.
Local AI isn't for everyone — and that's fine
If you're a solo operator who sends ten emails a day and makes five phone calls, cloud AI on a free tier is probably all you need. The upfront cost of local hardware doesn't make sense for very light usage.
But if you're a business with staff, with sensitive data, with hundreds of daily interactions that could be automated — local AI changes the equation entirely. You get better privacy, lower long-term costs, faster responses, and complete ownership of your tools.
We built our entire AI stack this way because it was the right decision for a healthcare practice handling patient data every day. The same logic applies to law firms, accounting practices, financial advisers, medical specialists — anyone where data privacy isn't optional and usage volume makes cloud pricing uncomfortable.
If you want to understand what local AI could look like for your business — what hardware you'd need, what it would cost, and whether it makes sense compared to cloud — get in touch. We'll walk through your situation and give you a straight answer. No jargon, no upsell. Just the maths.
Want to build something like this?
We build custom AI tools for businesses. Tell us what you're dealing with — we'll tell you what's possible.