We did not start with a model. We started with a clipboard. Two days of sitting next to the team, watching what they actually did with each comment. Out of about three hundred responses we read together, four templates covered ninety percent of the work. The other ten percent was where the judgment lived. That ratio became the architecture of the agent: classify, draft from a template, hand the rest to a human with three suggested replies and the relevant code cited.
The system that runs in production is dull on purpose. A scheduled scrape, a small classifier we fine-tuned on the developer's prior responses, a templated drafter, and a simple Slack queue for the cases that need a person. No platform, no orchestration framework, no proprietary model. The whole thing fits on a page of architecture.
Eight weeks in, two of the six projects on the watchlist closed ahead of their original schedules. The carrying-cost savings on those alone covered the engagement many times. The PM has not opened the portal directly in two months. The agent has missed zero comments. It still flags the rare weird one for a human, exactly the way we wrote it to.