✨ TLDR

Helped a UK travel company build an AI agent for end-to-end trip planning. Talked them out of a trendy swarm architecture, shipped with a simpler approach instead. Saved ~1 month of engineering time. Set up evals so the team can iterate systematically.


🎯 The Problem

The client wanted to build an AI agent that could plan entire trips — flights, hotels, activities, everything. Users would describe what they’re looking for, and the agent would handle it all with a human-in-the-loop flow for actually booking.

Here’s where it got interesting: their engineers wanted to build a swarm architecture. It was the hot thing at the time. Multiple specialized agents coordinating with each other, very sophisticated.

The problem? Complex architectures have too much surface area. They’re hard to build, hard to debug, and even harder to iterate on. When something breaks, you don’t know where to look.


💡 The Solution

Start simple. Add complexity only when you need it.

I convinced them to begin with a single agent + tool calls. No fancy multi-agent orchestration. Just one agent that could call the right tools to search flights, check hotels, book activities.

We shipped fast and covered most of their business use cases with this approach.

Only when it actually started breaking did we add a supervisor layer — and even then, it was minimal. Not a full swarm, just enough coordination to handle the edge cases.


🛠️ What I Built

Phase 1: Simple Agent

Single agent with tool calls. Connected to real services: flights, hotels, activities. Human-in-the-loop workflow for booking confirmation.

This got us to a working product fast.

Phase 2: Supervisor (when needed)

When the single agent started hitting its limits, we added a lightweight supervisor. But we only did this after we had real evidence it was needed — not before.

Phase 3: Evals

This is the part that pays dividends long after I’m gone.

I integrated an eval framework into their workflow. Now:

  • Bad prompt changes get caught before shipping
  • Engineers can test new models quickly when they come out
  • They can measure cost/latency/accuracy trade-offs systematically

Knowledge Transfer

Jumped into their codebase, shipped it myself, then taught their engineers the patterns. The goal was always to set them up to continue without me.


📊 Results

  • ~1 month saved by avoiding premature swarm architecture
  • Evals = sustainable quality without manual review
  • Team can now independently test new models and iterate on agent performance
  • Stayed on for a few months of advisory to help when they got stuck

🧠 What I Learned

Start simple, complexity later. Everyone wants to build the sophisticated thing first. But you don’t know what complexity you actually need until you hit real problems. Ship the simple version, then evolve.

Evals are infrastructure. Most teams treat evals as a nice-to-have. But baking them into their workflow means quality compounds over time. Bad changes get caught automatically. New models get tested systematically. It’s the gift that keeps giving.