40% Better, 75% Faster: How Frigade Slashed Latency & Boosted User Helpfulness

ai
rag
evals
Published

May 20, 2025

Frigade faced a critical challenge: users were complaining about their AI assistant’s performance.

That familiar sinking feeling. They knew the AI worked—sometimes. But they were flying blind, relying on user feedback and “vibes” to understand performance. Sound familiar?

Partnering with Wicked Data, Frigade implemented evaluation-driven development to transform their AI from guesswork to systematic knowing. For this B2B SaaS company, where every user interaction matters and churn is costly, the results were game-changing: 40% Better response acceptability and 75% Faster user-perceived latency—with a team that could now confidently answer “That is a known failure mode” instead of scrambling for explanations.

The Challenge: Flying Blind with AI Performance

Frigade’s engineering team faced the classic “black box” problem that plagues AI implementations:

This scenario breeds the familiar cycle: reactive firefighting instead of proactive improvement. Frigade’s team was talented, but they were flying blind. They needed to move from “vibes-based” AI development to systematic evaluation that connected performance to business outcomes.

The Solution: Evaluation-Driven Development

Frigade’s ambition was an AI assistant that actively shows users how to achieve tasks, rather than merely pointing to documentation. But first, they needed to stop flying blind.

Wicked Data provided the systematic evaluation framework that transformed their AI development from guesswork to knowing. The key wasn’t just building better AI—it was building the systematic process to measure, diagnose, and improve AI performance consistently.

The Key Unlock: From Guesswork to Systematic Knowing

The engagement’s cornerstone was implementing evaluation-driven development (EDD)—the systematic approach that replaces “vibes-based” AI development with data-driven confidence.

When you’re flying blind with AI performance, every bug feels like a mystery and every improvement feels like luck. EDD provides the systematic framework to measure progress, diagnose failures, and concentrate efforts on changes that actually matter.

As detailed in “AI Observability: You can’t fix what you can’t see,” you can’t optimize what you can’t measure systematically. This is the foundation that transforms engineering teams from reactive firefighters to proactive builders.

Wicked Data instituted a robust evaluation process including:

  1. Representative Query Sets: Curated queries and inputs reflecting genuine user scenarios.
  2. Automated System Testing: A custom CLI script to automate query testing against live system versions.
  3. Trace Recording: Braintrust integration for detailed AI interaction trace recording—vital for AI observability.
  4. Annotation & Acceptability Scoring: A Braintrust-configured process for annotating traces to assess AI response acceptability.
  5. Performance Metrics & Reporting: Custom CLI scripts to download annotated logs and compute key metrics, creating a version-controlled “performance snapshot” in Git for trend analysis.

This framework facilitated granular analysis of metrics including:

5. From Insights to Impact: Iterative Improvements

This robust evaluation system enabled Wicked Data and Frigade to meticulously “follow the breadcrumbs” in the data, pinpointing critical areas for enhancement. This data-centric approach spurred improvements in:

Measurable Results: A Leap in AI Performance & User Experience

The systematic application of EDD and targeted refinements yielded significant, quantifiable outcomes, making Frigade’s AI 40% Better and 75% Faster:

These advancements directly translated into a vastly improved user experience, minimizing frustration and empowering Frigade’s users to achieve their objectives with greater ease and speed.

Empowering Frigade for Ongoing Success: Mentorship and Knowledge Transfer

Beyond the immediate technical deliverables, Wicked Data emphasized mentoring and knowledge transfer to the Frigade team. This engagement armed Frigade with the requisite skills and processes to independently manage their evaluation cycles. Consequently, Frigade’s team can now harness their intrinsic product understanding to generate new insights and continuously iterate on their AI system.

Conclusion: From Sinking Feeling to Confident Answers

Frigade’s success story illustrates the transformation every engineering leader craves: moving from that sinking feeling of flying blind to the confidence of systematic knowing.

Before: “Why are users complaining about the AI assistant?” After: “That is a known failure mode. And here’s the dashboard showing exactly how, why, and what we’re improving next.”

This transformation is possible for your team. Through evaluation-driven development, you can stop flying blind and start building AI with predictable, measurable outcomes.

Ready to Stop Guessing and Start Knowing?

If you want to build systematic AI evaluation that transforms your team from reactive firefighters to proactive builders, I can help you implement this framework in just one week.

My intensive workshop walks your engineering team through building a robust evaluation system using your own data, in your own codebase—the same approach that helped Frigade achieve 40% better response acceptability and 75% faster user-perceived latency.

Schedule a free consultation to discuss your team’s specific challenges →