Every UK accounting practice we’ve walked into has the same three bottlenecks. The names change. The software changes. The bottlenecks don’t.
Invoices pile up in a shared inbox. Someone spends half their Monday classifying them. Management reports get compiled by hand from three different tools because nobody trusts the export.
We’ve been shipping AI automations into UK practices for eighteen months. Here’s what worked, what didn’t, and what we’d tell ourselves before starting.
The three that paid for themselves in weeks
1. Inbox triage for invoices and receipts
Every practice has an accounts@ mailbox. Suppliers email PDFs. Clients forward receipts. Someone has to open each one, figure out whose it is, classify it, and file it.
We replaced that work with an n8n workflow that polls the mailbox every five minutes. Claude reads the attachment, extracts vendor, amount, date and VAT. The file lands in the right Drive folder with a consistent filename. An Airtable row gets created for audit. The original email is labelled and archived.
Classification accuracy sits around 96% on the real-world mix we’ve seen. The 4% that gets flagged for human review is almost always a genuinely ambiguous case (missing VAT number, unclear supplier) — exactly the kind of thing a junior bookkeeper should be looking at anyway.
Payback was immediate. One practice we worked with was paying a part-time bookkeeper four hours a week just to open emails. That four hours went back to higher-margin work.
2. Drafting client responses from ticket context
Clients ask the same fifteen questions. “Have you filed my VAT return?” “When’s my next deadline?” “Do I need to send you anything for the payroll run?”
A grounded assistant with read-access to the practice management system answers those directly in the client’s preferred channel. It doesn’t make stuff up because it’s querying live data, not guessing. It escalates the moment it hits something it can’t verify.
Two things made this work:
- Scoping it to a fixed question set. It answers the fifteen things, and deflects on anything else.
- Making the “I don’t know, let me put you through to your accountant” path feel natural, not apologetic.
3. Month-end report compilation
The report nobody wants to make. Someone pulls numbers from Xero, copies them into a template, formats them, emails the PDF.
Automated end-to-end in a workflow that runs on the first working day of the month. The only human step left is reviewing and signing off. Total time went from two hours to fifteen minutes.
The two we tore out
Not everything worked. Two honest misses.
Full auto-reconciliation
We tried to push bank reconciliation all the way to automatic. The system was right 92% of the time. That 8% cost more to review, correct, and unwind than the 92% saved. We kept the “suggest a match” part and killed the “commit the match” part.
The lesson: high-stakes classification with bad reversal cost needs a human in the loop, regardless of accuracy.
Auto-generated client explanations
A draft of “here’s what your numbers mean this month” was a good idea that didn’t survive contact with clients. Two issues:
- When it was wrong, it was confidently wrong. Clients noticed.
- Partners didn’t trust it enough to send without rewriting.
If the partner has to rewrite it, it’s faster to write it from scratch. We shelved it.
What we’d tell ourselves before starting
One: pick the automation where the cost of being 4% wrong is “a human checks it” and not “we refund a client”. That rules out more than you’d think. Payroll automation is seductive. It’s also unforgiving. Invoice triage is boring. It’s also forgiving. Start with boring.
Two: instrument everything. If you can’t see what the automation classified and why, you can’t trust it. Every workflow we run logs to Airtable with the model’s reasoning attached. Date, vendor, amount, confidence score, the chain of thought that led to the classification. When something looks off six weeks later, we can audit. When a client asks “why did this get categorised as travel expense”, we have the answer. Without the log, you’re running blind and the first wrong classification burns all the trust you built in the first five hundred right ones.
Three: the infrastructure matters more than the model. We host these on our own n8n stack on Ubuntu, with Postgres for persistence and Redis for queue throughput. Claude is the brain; the orchestration is the nervous system. If the orchestration is flaky, model quality is irrelevant. We’ve seen clients burn six months on “should we use Claude or GPT-4” and then discover their actual problem was a Zapier flow that silently retries failed webhooks for two hours before giving up. Fix the plumbing first. The model is rarely your bottleneck.
Four: document the “why nots” as aggressively as the “whys”. Every project we run has a short README explaining which automations we deliberately did NOT build, and the reason. Six months later when someone new joins, or when the founder asks “why isn’t X automated”, the answer is in the repo. Without it, you end up re-litigating the same decisions quarterly.
If you’re considering this
The honest bar: you need a process that’s already documented (or documentable), that recurs at least weekly, where the failure mode is a flag-for-review rather than an irreversible action. If you can’t describe the process on a whiteboard in ten minutes, you can’t automate it yet. The automation will only ever be as clear as the human version that came before it.
If you’ve got that, the automation pays for itself inside a month. If you don’t, the automation becomes another tool nobody maintains. We’ve seen both. The difference is almost always in how well the client understood their own process before they called us. The practices that succeeded came in saying “here’s exactly what our bookkeeper does, minute by minute”. The ones that struggled came in saying “we want to be more efficient”.
The other thing worth saying: automation isn’t the point. Your clients don’t care whether an invoice was classified by a human or by Claude. They care whether their books are right, their VAT is filed on time, and their partner takes their call when they have a question. Everything we’ve built is in service of those three things. Move the clerical work off the partner’s desk so the partner has more time for the conversation the client actually wants.
We build these end to end for UK practices. If you want a 30-minute call to walk through what your first one should be, honestly and specifically, get in touch. No workshop decks, no proposals until we know it’s worth building. Just a conversation about your process and whether we’d be the right people to help.