Northwind is a fictional B2B SaaS company with about 200 employees. They're a Salesforce shop. And like most Salesforce shops, their CRM had slowly rotted. Fields half-filled. Duplicate accounts scattered across the org. Follow-ups on stale opportunities slipping between the cracks. The VP of RevOps had a problem: they could hire another person to manage the cleanup, or try something new.
They called us. They had two weeks to prove the CRM Specialist agent could fix it.
Week 1: The Setup
We connected the agent to a Salesforce sandbox on day one, piped in a Slack integration by day two. The setup was straightforward—not because the agent magically "just works," but because we defined three specific jobs with clear boundaries:
- Enrich new leads: Pull in company metadata from external sources (Apollo, LinkedIn), flag incomplete fields, suggest enrichments.
- Dedup accounts: Run fuzzy matching on company names, detect overlaps, flag them for human review.
- Draft follow-ups: Identify stale opportunities (no activity in 30+ days), draft personalized follow-up emails using the rep's historical voice.
We set human-in-the-loop (HITL) gates on the first 1,000 actions. Every enrichment suggestion, every dedup flag, every draft went to a Slack channel where a RevOps person approved or rejected it. The cost of being wrong was low enough, but we wanted the team to build muscle memory with the agent before graduating to full autonomy.
The First Numbers
By the end of week two, the agent had processed 8,400 records. It suggested 1,200 enrichments. Sales reps accepted 94% of them. It flagged 87 duplicate accounts. All 87 were confirmed. And each sales rep who had the agent wired to their workflow got five hours back per week—time they were using to actually talk to customers instead of chasing data in Salesforce.
"It's the first sales tool we've deployed where adoption was a problem we wanted to have." — VP of RevOps, Northwind
The Surprise
We expected pushback from sales reps. Agents are scary. They touch your data. But the opposite happened. Reps didn't resent the agent. They treated it like a junior coworker. Two of them started asking for more autonomy, requesting that we graduate the approval gates on enrichment work.
By week three, we did. For enrichments scored above a confidence threshold (0.92), the agent could write directly to Salesforce without human sign-off. The gate stayed on duplicate detection and follow-up drafts, because those have higher stakes.
Three Months In
The metrics crystallized:
- Sales-ops workload: Down 40% (measured by tickets to RevOps). That VP now spends time on strategy instead of data janitor work.
- CRM data quality score: Up from 62 to 89 (they use a custom scoring model on field completeness and deduplication).
- Team expansion: One sales rep who had been drowning in admin work was promoted to a strategic account manager role—time freed by the agent made that possible.
What Didn't Work
The agent's first attempts at follow-up emails were templated. Clinical. They read like they were written by a machine. (They were.) The team fed back on tone. We iterated the system prompt twice, added a small fine-tune on Northwind's email corpus, and by iteration three, the emails looked like they could have come from a rep.
That's the work that doesn't show up in the case study: the prompt engineering, the feedback loops, the tweaks to confidence thresholds. It's not glamorous, but it's real.
Three Lessons
Northwind's rollout teaches us something about how to deploy agents into regulated, data-sensitive environments:
- Start with HITL. Humans + agents > agents alone. The approval gates on week one weren't overhead—they were the trust-building layer that made week three possible.
- Graduate gates with data. Don't hand off autonomy on ideology. Use confidence scores, error rates, rep feedback. When the data says "this category is safe," move the gate.
- Expect the team's relationship with the agent to evolve. Northwind's reps went from skeptical to collaborative to demanding more. That's not a bug in the deployment—it's a sign it's working.