The problem
Eazel forms are the workhorse of college admissions on the CollegeNET platform. They handle thousands of application questions across hundreds of universities, with deeply conditional logic, branching workflows, and validation rules that take experienced developers days to assemble. The bottleneck wasn't form complexity — it was the gap between a customer's spec ("here's our 2025 application") and the structured form configuration that powered it.
The premise
I had a hypothesis worth testing on personal time: an LLM with the right context could draft most of a form configuration from natural language, and the editor's existing validation rules could catch what it got wrong. I built a prototype as a side project, demoed it to engineering leadership, and presented to the whole company. The response was strong enough to put it on the roadmap.
How it worked
- Structured output, not freeform text. The LLM produced typed Eazel form JSON. The editor's existing schema enforced correctness at the type level.
- Differential edits, not full rewrites. Early versions generated whole forms on every change, which burned tokens. I refactored to emit only the diff against the current form state.
- RAG with a vector database. Large application forms exceeded the model's context window. I broke them into semantically meaningful chunks (sections, question groups), embedded them, and retrieved relevant context per request.
- Human-in-the-loop refinement. Every AI-generated change was reviewed by a form developer before going live. Approve, reject, or edit — three buttons, low friction.
- Fine-tuning on rejections. Rejected changes became training data. A surprisingly small set of corrections produced material improvements in the model's handling of business logic the base model consistently got wrong.
What I learned
The reject button is the product. The interesting part of an AI-enabled workflow isn't the generation step — it's the loop that surrounds it. Rejection isn't an error state to suppress; it's the highest-signal feedback you can collect, and treating it as training data closes the loop cleanly.
Latency budgets shape interaction design. Generating a whole form took 15+ seconds. Generating a diff took 2 — different products entirely. The UI you build for a 2-second wait is not the UI you build for a 15-second one, and the willingness to refactor the request shape mattered more than any prompt engineering I did.
Augment, don't replace. Form developers were initially worried the tool would obsolete their work. By the demo, the framing had landed: this lets them work on the interesting forms (genuinely novel applications, complex conditional logic) and lets the AI handle the repetitive scaffolding. The reject button gave them control over what shipped.
What's next
The feature didn't reach production before the team was laid off. The prototype, the architecture, and the rollout plan are intact, and the human-in-the-loop pattern carried over to a related project I built shortly after — a computer vision system for the Parking Reform Network that uses the same approve / reject / refine pattern to validate AI-generated parking lot detections from satellite imagery.