Solution Architecture Document — ServiceBay AI
Project: ServiceBay AI Author: Jarrod E. Brown Program: IBM SkillsBuild AI Experiential Learning Lab 2026 (Custom challenge — Consumer Automotive Diagnostics & Trust) Status: MVP complete · 16 of 17 end-to-end tests passing Repository: github.com/jarrodebrown/servicebay-ai
1. Overview
ServiceBay AI is a no-code, multi-agent conversational assistant that turns a vague vehicle warning light into a verified diagnosis, recall check, fair-cost estimate, and shortlist of nearby repair shops — returned as one structured answer in under two minutes. The entire system runs on IBM watsonx Orchestrate.
The Orchestrator implements the Knowledge Synthesizer archetype: the information already exists across manuals, regulators, and cost references, but the value is in synthesis, not retrieval. Five specialist agents each retrieve a slice of the answer and the Orchestrator composes them into a single response.
2. Problem & Context
Every driver eventually faces a dashboard warning with no context about severity or next steps. The available options are poor: forum threads contradict each other, shop quotes can't be verified, and roughly 38% of recalled vehicles are never repaired (NHTSA 2018–2022) because most drivers never check NHTSA themselves. The everyday driver speaks in symptoms ("yellow engine light"), not OBD-II codes, and does not own a scanner. No existing tool combines diagnosis, recall lookup, fair-cost estimate, and shop discovery in one conversational answer without a hardware purchase.
3. Goals & Requirements
Functional
- Accept a plain-language symptom plus vehicle and ZIP code.
- Classify severity (CRITICAL / CAUTION / INFORMATIONAL).
- Return a six-section answer: meaning, severity, recall status, cost, next step, nearby shops.
- Cite sources for every factual claim; never fabricate cost figures.
Non-functional
- Time from alert to confidence reduced from ~30 minutes to under 2 minutes.
- 100% source-cited responses; zero hallucinated or fabricated numbers.
- No-code/low-maintenance footprint suitable for a single builder.
4. Decision Rationale
Why watsonx Orchestrate over alternatives? The IBM SkillsBuild program provided access to Orchestrate's multi-agent runtime, but the choice has genuine architectural merit beyond availability. Orchestrate offers a no-code agent builder with native tool integration (OpenAPI specs import directly), built-in knowledge resources for RAG, and a managed runtime that eliminates infrastructure overhead. The alternative — a code-first framework like LangGraph or CrewAI — would have required building and hosting agent orchestration, tool routing, and RAG pipelines from scratch. For a single-builder project optimizing for time-to-value over customization depth, Orchestrate's managed approach was the right tradeoff.
Why multi-agent over monolithic? A monolithic prompt attempting diagnosis, recall lookup, cost estimation, and shop discovery in a single pass would face compounding context-window pressure and make it impossible to enforce domain-specific guardrails. The multi-agent split provides three concrete benefits: each agent carries only the instructions and tools relevant to its domain (the cost agent's hard rules against fabricated dollar amounts would conflict with the recall agent's need to surface raw NHTSA data); agents can be tested and improved independently (16 of 17 end-to-end tests target individual agent behaviors); and the pattern is portable — HandyHome AI reuses the same orchestrator-plus-specialists architecture for a different domain.
Why WISER prompting? Agent behavior in Orchestrate is defined entirely through natural-language instructions — there is no code layer to enforce contracts. The WISER framework (Who, Instructions, Sub-tasks, Examples, Review) provides a repeatable structure that makes agent behavior auditable and testable. Each agent's prompt follows the same five-section template, which made it straightforward to iterate on individual sections (e.g., tightening the cost agent's fabrication rules) without destabilizing the overall prompt.
Why GPT-OSS 120B via Groq? The Orchestrate trial environment routes through Groq's inference infrastructure, providing access to a capable open-source model with low-latency inference. This was a platform constraint rather than a deliberate model selection, but the performance characteristics — fast token generation supporting the sub-two-minute response target — aligned well with the system's needs.
5. Architecture Overview
6. Components
| # | Agent | Responsibility |
|---|---|---|
| 1 | servicebay_ai_agent (Orchestrator) |
Parses intent, classifies severity, routes to specialists, composes the final six-section answer. |
| 2 | servicebay_knowledge_agent |
RAG over vehicle owner's manuals with a 3-path fallback (manual hit / irrelevant retrieval / unsupported vehicle). |
| 3 | servicebay_recall_agent |
Calls three NHTSA APIs (Recalls, Safety Ratings, Complaints) plus a VIN decoder; returns raw structured data. |
| 4 | servicebay_cost_agent |
Retrieves from the Repair Cost Reference KB; hard rules block fabricated amounts and require assumption notes on inferred causes. |
| 5 | servicebay_repair_agent |
Geocodes the ZIP and queries OpenStreetMap Overpass for shops within 5 miles (expands to 10 if sparse). |
7. RAG Implementation
ServiceBay AI uses Retrieval-Augmented Generation across two knowledge resources, both implemented as watsonx Orchestrate knowledge bases with document-backed retrieval.
Knowledge Source 1 — Vehicle Owner's Manuals. Ten split PDFs covering two reference vehicles (2019 Honda CR-V and 2021 Toyota Camry) are uploaded as the ServiceBay Vehicle Manuals knowledge resource. The knowledge agent retrieves against this corpus when the user describes a symptom, grounding its diagnostic explanation in manufacturer documentation rather than model-generated content.
Knowledge Source 2 — Repair Cost Reference. A curated CSV (14 repair types × 33 OBD-II codes, with vehicle-specific rows and generic fallbacks) is uploaded as the Repair Cost Reference knowledge resource. The cost agent retrieves matching rows to produce grounded cost estimates. Hard rules in the agent's WISER instructions prohibit returning any dollar figure not sourced from this reference — the primary anti-hallucination control.
Chunking and retrieval. Orchestrate handles document chunking and vector embedding internally; the builder uploads source documents and the platform manages the retrieval pipeline. Queries are routed to the appropriate knowledge resource by the Orchestrator based on intent classification — diagnostic queries go to the manuals, cost queries go to the reference CSV.
Three-path fallback (knowledge agent). The knowledge agent implements three response paths based on retrieval quality: (1) a manual hit, where retrieved content directly addresses the symptom; (2) irrelevant retrieval, where the retrieved chunks don't match the query and the agent acknowledges the gap rather than forcing an answer; and (3) unsupported vehicle, where the vehicle isn't in the knowledge base and the agent returns a documented limitation message. This fallback logic is tested explicitly — test scenario 17 validates the unsupported-vehicle path.
MVP scope and two use cases. The current RAG implementation targets two use cases: warning-light diagnosis (what does this light mean, grounded in the owner's manual) and repair-cost estimation (what will this cost, grounded in the reference CSV). These represent the MVP's core value proposition — turning a vague symptom into a verified, source-cited answer.
8. Data Flow
- User input. The driver submits a natural-language symptom description along with vehicle information (year, make, model, and optionally a VIN) and a ZIP code in the Orchestrate chat interface.
- Intent parsing and severity classification. The Orchestrator agent parses the query to extract the vehicle, symptom, and location. It classifies severity into one of three tiers — CRITICAL (stop driving), CAUTION (schedule service soon), or INFORMATIONAL (monitor) — which determines response urgency framing.
- Agent routing. The Orchestrator fans out to the relevant specialist agents. A typical warning-light query activates all four specialists; a recall-only query may skip the knowledge and cost agents.
- Knowledge retrieval (RAG). The knowledge agent queries the Vehicle Manuals knowledge resource with the symptom and vehicle context. Retrieved chunks are evaluated against the three-path fallback logic. The cost agent independently queries the Repair Cost Reference with the inferred repair type and vehicle specifics.
- External API calls. The recall agent calls three NHTSA endpoints (Recalls by make/model/year, Safety Ratings, and Complaints) and optionally the VIN decoder for precise vehicle identification. The repair agent geocodes the ZIP via Nominatim and queries OSM Overpass for
shop=car_repairnodes within a 5-mile radius (expanding to 10 miles if fewer than three results). - Response synthesis. The Orchestrator collects all agent responses and composes a single six-section answer (meaning, severity, recall status, cost estimate, recommended next step, nearby shops). Every factual claim includes its source attribution.
9. Data Model
The system operates on transient conversational state rather than a persistent database. Key data entities flow through the pipeline as follows:
User query context — vehicle descriptor (year, make, model), optional VIN, symptom description in natural language, and ZIP code. This context is parsed by the Orchestrator and passed to each specialist agent as structured input.
Agent context frames — each specialist agent receives its slice of the query context and returns a structured response. The knowledge agent returns diagnostic text with source attribution; the recall agent returns structured recall/rating/complaint records; the cost agent returns a cost range with repair type, assumptions, and source reference; the repair agent returns a list of shops with name, address, distance, and coordinates.
Knowledge base structure — two Orchestrate knowledge resources. The Vehicle Manuals resource stores chunked PDF content with metadata linking chunks to source documents (vehicle, manual section). The Repair Cost Reference stores CSV rows keyed by repair type, OBD-II code, and vehicle specifics, with generic fallback rows for unsupported vehicles.
Synthesized response — the Orchestrator's output: a six-section structured answer with severity classification, source citations per claim, and a recommended action tied to the severity tier.
10. External Interfaces
OpenAPI-defined tools (6 total):
| Tool | Endpoint | Purpose | Rate / Auth |
|---|---|---|---|
| VIN Decoder | NHTSA vPIC API | Decode a VIN to year/make/model/body type | Public, no key |
| Recalls Lookup | NHTSA Recalls API | Active recalls for a year/make/model | Public, no key |
| Safety Ratings | NHTSA Safety Ratings API | NCAP crash-test ratings | Public, no key |
| Complaints | NHTSA Complaints API | Consumer complaints for a vehicle | Public, no key |
| ZIP Geocoder | Nominatim (OSM) | Convert a ZIP code to lat/lon coordinates | Public, usage policy |
| Repair Shop Search | OSM Overpass API | Find shop=car_repair POIs near coordinates |
Public, usage policy |
RAG knowledge resources (2 total):
| Resource | Content | Format | Coverage |
|---|---|---|---|
| ServiceBay Vehicle Manuals | Owner's manuals for 2019 Honda CR-V and 2021 Toyota Camry | 10 split PDFs, Orchestrate-managed chunking | MVP: 2 vehicles |
| Repair Cost Reference | RepairPal-aligned cost ranges by repair type and OBD-II code | CSV, 14 repair types × 33 codes + generic fallbacks | MVP: common repairs |
11. Error Handling & Resilience
NHTSA API unavailability. The recall agent treats NHTSA as a best-effort data source. If any of the three NHTSA endpoints returns an error or times out, the agent returns a partial result with the available data and a note indicating which lookups could not be completed. The Orchestrator still synthesizes a response — a missing recall check does not block the diagnostic or cost sections. The response explicitly states that recall status could not be verified and recommends the driver check NHTSA.gov directly.
OSM/Nominatim unavailability. If the geocoder or Overpass API is unreachable, the repair agent returns an empty shop list with an explanation. The Orchestrator omits the shop section and includes a fallback recommendation to search for nearby shops manually.
RAG retrieval failures. The knowledge agent's three-path fallback handles retrieval-quality issues at the application level. If the Orchestrate knowledge resource itself is unreachable (platform-level failure), the Orchestrator proceeds without the diagnostic section and notes the limitation.
Cost fabrication prevention. The cost agent's hard rules are the system's primary trust control. If no matching row exists in the Repair Cost Reference, the agent returns "cost estimate unavailable for this repair type" rather than generating a figure. This is a deliberate design choice — a missing estimate is better than a fabricated one.
Graceful degradation pattern. The system is designed so that any single agent failure produces a partial but honest response rather than a complete failure. The Orchestrator's synthesis step handles missing sections by acknowledging what couldn't be retrieved and suggesting the driver pursue that information through other channels.
12. Non-Functional Requirements (Measured)
| NFR | Target | Basis |
|---|---|---|
| End-to-end response time | < 2 minutes | Measured against the baseline of ~30 min manual research |
| Source citation coverage | 100% of factual claims | Enforced by agent instructions; validated in test suite |
| Cost fabrication rate | 0% | Hard rules in cost agent; tested with adversarial prompts |
| Test pass rate | 16/17 (94%) | End-to-end test suite; 1 expected failure (unsupported vehicle graceful degradation — passes on fallback path, flagged as known limitation scope) |
| Knowledge base coverage | 2 reference vehicles | MVP scope; unsupported vehicles handled by fallback path |
| Shop search radius | 5 mi default, 10 mi expanded | Expansion triggered when < 3 results in initial radius |
| Severity classification accuracy | 3-tier (CRITICAL / CAUTION / INFO) | Validated against manufacturer severity definitions in test cases |
13. Tech Stack
| Layer | Technology | Role |
|---|---|---|
| Agent runtime | IBM watsonx Orchestrate | No-code multi-agent orchestration, tool integrations, knowledge resources |
| LLM | GPT-OSS 120B via Groq | Inference engine (routed through Orchestrate trial environment) |
| Prompt framework | WISER | Structured agent behavior definition (Who, Instructions, Sub-tasks, Examples, Review) |
| External APIs | NHTSA (4 endpoints), Nominatim, Overpass | Vehicle data, geocoding, POI search |
| Knowledge ingestion | Orchestrate knowledge resources | PDF chunking + vector retrieval (manuals), CSV retrieval (cost reference) |
| Integration bridge | MCP Server (Python) | Exposes Orchestrate agents to Claude Desktop via Model Context Protocol |
| Export format | YAML/ZIP | Reproducible agent/tool/KB definitions under agents/orchestrate/exports/ |
14. Security & Compliance
All external calls are read-only against public government and open-data APIs; no PII is persisted. Cost figures are constrained to the curated reference KB, with explicit hard rules preventing fabricated dollar amounts — the primary trust and safety control of the system. The Orchestrate environment manages authentication and access control for the agent runtime. No user data leaves the Orchestrate session boundary.
15. Deployment & Operations
The entire system runs inside watsonx Orchestrate; agents, tools, and knowledge bases are exported as YAML/zip definitions under agents/orchestrate/exports/ for reproducible import. A companion MCP server (separate repo: jarrodebrown/MCP-Server) exposes the same agents to Claude Desktop, enabling cross-platform access to ServiceBay AI's capabilities from within a general-purpose AI workflow.
16. Cross-Project Context
HandyHome AI (shared architecture pattern). HandyHome AI reuses the same orchestrator-plus-specialists multi-agent pattern for home-improvement diagnostics. The architectural decisions validated in ServiceBay AI — the Knowledge Synthesizer archetype, WISER prompting, domain-specific agent isolation, and the graceful-degradation pattern — transfer directly. The two projects also share a delivery-vertical concept: ServiceBay's Auto Parts Delivery and HandyHome's Home Improvement Delivery share approximately 70% of their platform engine (real-time multi-chain inventory aggregation, gig-driver dispatch, location-aware store routing).
MCP Server (integration bridge). The ServiceBay AI MCP Server bridges watsonx Orchestrate and Claude Desktop via the Model Context Protocol. It exposes four tools — list_agents, chat_with_agent, check_recalls, and find_repair_shops — that let Claude call ServiceBay AI agents natively. The two direct-API tools (check_recalls and find_repair_shops) operate independently of Orchestrate, providing resilience: if the Orchestrate instance is unavailable, recall lookups and shop discovery still work through Claude.
17. Risks, Assumptions & Limitations
- Manual-backed RAG currently covers two reference vehicles; unsupported vehicles fall back to a documented limitation path (1 of 17 test scenarios).
- Cost accuracy depends on the curated reference; inferred-cause paths require assumption notes. The system will not fabricate a cost figure — it returns "unavailable" rather than guessing.
- Shop discovery quality depends on OpenStreetMap coverage in the driver's area. Rural areas with sparse OSM data may return few or no results even at the expanded 10-mile radius.
- The system assumes the driver's symptom description is accurate. Ambiguous or misleading symptom descriptions may route to incorrect diagnostic paths.
- Model behavior is governed by WISER prompts, not code. Prompt drift across model updates is a maintenance risk — the test suite is the primary regression guard.
18. Roadmap
Phase 1 — Everyday Driver (current). Free-tier acquisition funnel targeting the everyday driver who encounters a warning light. MVP delivers diagnosis, recall check, cost estimate, and shop discovery in one conversational answer. The core value proposition: turn 30 minutes of fragmented research into a 2-minute verified answer.
Phase 2 — DIY Enthusiast. Expand to OBD-II code lookup (accept scanner codes directly, not just symptom descriptions) and parts identification with affiliate-revenue links. This phase serves drivers who own a scanner but need help interpreting codes and sourcing parts. Requires expanding the knowledge base to cover OBD-II code definitions and adding parts-catalog integrations.
Phase 3 — Collector & Enthusiast. Premium subscription tier for restoration guidance and vehicle valuation. This phase targets classic-car owners and enthusiasts who need specialized knowledge not covered by standard owner's manuals. Requires curated restoration knowledge bases and integration with valuation data sources.
Auto Parts Delivery (spin-off). An on-demand last-mile delivery vertical for auto parts, sharing ~70% of its platform engine with HandyHome AI's Home Improvement Delivery. Partners include AutoZone, O'Reilly, NAPA, and Advance Auto Parts (phased). The integration point is natural: ServiceBay AI identifies what's needed, the delivery vertical brings it to you.
Diagrams: Sequence Diagram · Data Flow Diagram · Component Diagram