I was doing AI before it was cool

In 2016 I ran an innovation lab inside Société Générale. The word “AI” was in the air, but it meant something narrower than it does today: optical character recognition, a bit of computer vision, natural language classifiers wired into a chatbot. We trained models on PyTorch and TensorFlow, argued about GPUs we could barely get budget for, and shipped things that a bank’s compliance team had never seen before.

One of those things was a chatbot we called YODA, “your own digital assistant.” It answered internal support questions that used to sit in a queue for a human. It went live years before ChatGPT taught everyone the word “prompt.” It was not magic. It was intent classification, a retrieval layer, and a lot of unglamorous work on the data behind it. It also took real load off the support teams, which is the only metric that ever mattered.

Around it we built other things that now sound quaint and were hard at the time: OCR pipelines on top of Tesseract, computer-vision experiments, even a couple of Pepper robots that we deployed to see what physical presence added. We ran predictive-maintenance projects on IoT data. Most of it worked. Some of it did not. All of it taught the same lesson.

The model was never the hard part

Every wave of AI arrives with the same story: the model is the breakthrough, and everything else is a detail. It is the reverse. The model is the easy part. It is a download, a paper, an API call. The hard part is everything around it.

It is the data, which is dirty, and which nobody owns cleanly. It is production, where the interesting failure modes live and where “it worked in the notebook” means nothing. It is trust, because a bank will not put a black box in front of a customer, and it should not. It is the operating cost, which is real and recurring and easy to hand-wave away in a demo.

I spent those years learning to build the parts that are not the model. That is what made the difference later.

2025, same lesson, bigger model

Fast forward to Chantelle. Generative AI had arrived, the models were extraordinary, and the temptation was to treat them as the answer to every question. We did not.

We put retrieval-augmented generation into production on a boring, dependable stack: PostgreSQL with pgvector for retrieval, Gemini for generation, n8n to orchestrate the ingestion and the workflows, Valkey for caching. We spent most of our effort on document ingestion, data quality, and the integrations into Magento, because that is where the value and the risk both lived. We rolled tools out to a handful of engineers first, then to the wider team once they earned their place.

The model changed. The lesson did not. Retrieval quality beat prompt cleverness. Evaluation beat vibes. The unglamorous plumbing decided whether the thing was dependable or just impressive in a demo.

What this means if you are evaluating AI now

If you lead a team that is being asked to “do something with AI,” here is what a decade of it has taught me.

Start from a problem that costs you real money or real time, not from the technology. Measure the thing you actually care about, before and after. Assume the model is the cheapest part of the system and budget your effort accordingly: data, evaluation, and production are where the work is. Ship to a small group, learn, then widen. And be honest about cost, because an AI feature that quietly triples your inference bill is not a win.

None of that is new. I was applying it in 2016 with models that would embarrass a modern phone. The tools got better. The discipline is what carries over.

That discipline is what I bring to the teams I work with now, whether that is as their engineering leader or as a second opinion when they are deciding what is worth building.