Solution Architecture Document — ServiceBay AI

Project: ServiceBay AI Author: Jarrod E. Brown Program: IBM SkillsBuild AI Experiential Learning Lab 2026 (Custom challenge — Consumer Automotive Diagnostics & Trust) Status: MVP complete · 16 of 17 end-to-end tests passing Repository: Private

1. Overview

ServiceBay AI is a no-code, multi-agent conversational assistant that turns a vague vehicle warning light into a verified diagnosis, recall check, fair-cost estimate, and shortlist of nearby repair shops — returned as one structured answer in under two minutes. The entire system runs on IBM watsonx Orchestrate.

The Orchestrator implements the Knowledge Synthesizer archetype: the information already exists across manuals, regulators, and cost references, but the value is in synthesis, not retrieval. Five specialist agents each retrieve a slice of the answer and the Orchestrator composes them into a single response.

2. Problem & Context

Every driver eventually faces a dashboard warning with no context about severity or next steps. The available options are poor: forum threads contradict each other, shop quotes can't be verified, and roughly 38% of recalled vehicles are never repaired (NHTSA 2018–2022) because most drivers never check NHTSA themselves. The everyday driver speaks in symptoms ("yellow engine light"), not OBD-II codes, and does not own a scanner. No existing tool combines diagnosis, recall lookup, fair-cost estimate, and shop discovery in one conversational answer without a hardware purchase.

3. Goals & Requirements

Functional

Accept a plain-language symptom plus vehicle and ZIP code.
Classify severity (CRITICAL / CAUTION / INFORMATIONAL).
Return a six-section answer: meaning, severity, recall status, cost, next step, nearby shops.
Cite sources for every factual claim; never fabricate cost figures.

Non-functional

Time from alert to confidence reduced from ~30 minutes to under 2 minutes.
100% source-cited responses; zero hallucinated or fabricated numbers.
No-code/low-maintenance footprint suitable for a single builder.

4. Decision Rationale

Why watsonx Orchestrate over alternatives? The IBM SkillsBuild program provided access to Orchestrate's multi-agent runtime, but the choice has genuine architectural merit beyond availability. Orchestrate offers a no-code agent builder with native tool integration (OpenAPI specs import directly), built-in knowledge resources for RAG, and a managed runtime that eliminates infrastructure overhead. The alternative — a code-first framework like LangGraph or CrewAI — would have required building and hosting agent orchestration, tool routing, and RAG pipelines from scratch. For a single-builder project optimizing for time-to-value over customization depth, Orchestrate's managed approach was the right tradeoff.

Why multi-agent over monolithic? A monolithic prompt attempting diagnosis, recall lookup, cost estimation, and shop discovery in a single pass would face compounding context-window pressure and make it impossible to enforce domain-specific guardrails. The multi-agent split provides three concrete benefits: each agent carries only the instructions and tools relevant to its domain (the cost agent's hard rules against fabricated dollar amounts would conflict with the recall agent's need to surface raw NHTSA data); agents can be tested and improved independently (16 of 17 end-to-end tests target individual agent behaviors); and the pattern is portable — the same orchestrator-plus-specialists architecture can be reused for a different domain.

Why WISER prompting? Agent behavior in Orchestrate is defined entirely through natural-language instructions — there is no code layer to enforce contracts. The WISER framework (Who, Instructions, Sub-tasks, Examples, Review) provides a repeatable structure that makes agent behavior auditable and testable. Each agent's prompt follows the same five-section template, which made it straightforward to iterate on individual sections (e.g., tightening the cost agent's fabrication rules) without destabilizing the overall prompt.

Why GPT-OSS 120B via Groq? The Orchestrate trial environment routes through Groq's inference infrastructure, providing access to a capable open-source model with low-latency inference. This was a platform constraint rather than a deliberate model selection, but the performance characteristics — fast token generation supporting the sub-two-minute response target — aligned well with the system's needs.

5. Architecture Overview

ServiceBay AI Component Diagram — Component Diagram — full system architecture

6. Components

#	Agent	Responsibility
1	`servicebay_ai_agent` (Orchestrator)	Parses intent, classifies severity, routes to specialists, composes the final six-section answer.
2	`servicebay_knowledge_agent`	RAG over vehicle owner's manuals with a 3-path fallback (manual hit / irrelevant retrieval / unsupported vehicle).
3	`servicebay_recall_agent`	Calls three NHTSA APIs (Recalls, Safety Ratings, Complaints) plus a VIN decoder; returns raw structured data.
4	`servicebay_cost_agent`	Retrieves from the Repair Cost Reference KB; hard rules block fabricated amounts and require assumption notes on inferred causes.
5	`servicebay_repair_agent`	Geocodes the ZIP and queries OpenStreetMap Overpass for shops within 5 miles (expands to 10 if sparse).

7. RAG Implementation

ServiceBay AI Sequence Diagram — Sequence Diagram — RAG retrieval path within the query flow

ServiceBay AI uses Retrieval-Augmented Generation across two knowledge resources, both implemented as watsonx Orchestrate knowledge bases with document-backed retrieval.

Knowledge Source 1 — Vehicle Owner's Manuals. Ten split PDFs covering two reference vehicles (2019 Honda CR-V and 2021 Toyota Camry) are uploaded as the ServiceBay Vehicle Manuals knowledge resource. The knowledge agent retrieves against this corpus when the user describes a symptom, grounding its diagnostic explanation in manufacturer documentation rather than model-generated content.

Knowledge Source 2 — Repair Cost Reference. A curated CSV (14 repair types × 33 OBD-II codes, with vehicle-specific rows and generic fallbacks) is uploaded as the Repair Cost Reference knowledge resource. The cost agent retrieves matching rows to produce grounded cost estimates. Hard rules in the agent's WISER instructions prohibit returning any dollar figure not sourced from this reference — the primary anti-hallucination control.

Chunking and retrieval. Orchestrate handles document chunking and vector embedding internally; the builder uploads source documents and the platform manages the retrieval pipeline. Queries are routed to the appropriate knowledge resource by the Orchestrator based on intent classification — diagnostic queries go to the manuals, cost queries go to the reference CSV.

Three-path fallback (knowledge agent). The knowledge agent implements three response paths based on retrieval quality: (1) a manual hit, where retrieved content directly addresses the symptom; (2) irrelevant retrieval, where the retrieved chunks don't match the query and the agent acknowledges the gap rather than forcing an answer; and (3) unsupported vehicle, where the vehicle isn't in the knowledge base and the agent returns a documented limitation message. This fallback logic is tested explicitly — test scenario 17 validates the unsupported-vehicle path.

MVP scope and two use cases. The current RAG implementation targets two use cases: warning-light diagnosis (what does this light mean, grounded in the owner's manual) and repair-cost estimation (what will this cost, grounded in the reference CSV). These represent the MVP's core value proposition — turning a vague symptom into a verified, source-cited answer.

8. Data Flow

ServiceBay AI Data Flow Diagram — Data Flow Diagram — visual trace of data through the system

User input. The driver submits a natural-language symptom description along with vehicle information (year, make, model, and optionally a VIN) and a ZIP code in the Orchestrate chat interface.
Intent parsing and severity classification. The Orchestrator agent parses the query to extract the vehicle, symptom, and location. It classifies severity into one of three tiers — CRITICAL (stop driving), CAUTION (schedule service soon), or INFORMATIONAL (monitor) — which determines response urgency framing.
Agent routing. The Orchestrator fans out to the relevant specialist agents. A typical warning-light query activates all four specialists; a recall-only query may skip the knowledge and cost agents.
Knowledge retrieval (RAG). The knowledge agent queries the Vehicle Manuals knowledge resource with the symptom and vehicle context. Retrieved chunks are evaluated against the three-path fallback logic. The cost agent independently queries the Repair Cost Reference with the inferred repair type and vehicle specifics.
External API calls. The recall agent calls three NHTSA endpoints (Recalls by make/model/year, Safety Ratings, and Complaints) and optionally the VIN decoder for precise vehicle identification. The repair agent geocodes the ZIP via Nominatim and queries OSM Overpass for shop=car_repair nodes within a 5-mile radius (expanding to 10 miles if fewer than three results).
Response synthesis. The Orchestrator collects all agent responses and composes a single six-section answer (meaning, severity, recall status, cost estimate, recommended next step, nearby shops). Every factual claim includes its source attribution.

9. Data Model

The system operates on transient conversational state rather than a persistent database. Key data entities flow through the pipeline as follows:

User query context — vehicle descriptor (year, make, model), optional VIN, symptom description in natural language, and ZIP code. This context is parsed by the Orchestrator and passed to each specialist agent as structured input.

Agent context frames — each specialist agent receives its slice of the query context and returns a structured response. The knowledge agent returns diagnostic text with source attribution; the recall agent returns structured recall/rating/complaint records; the cost agent returns a cost range with repair type, assumptions, and source reference; the repair agent returns a list of shops with name, address, distance, and coordinates.

Knowledge base structure — two Orchestrate knowledge resources. The Vehicle Manuals resource stores chunked PDF content with metadata linking chunks to source documents (vehicle, manual section). The Repair Cost Reference stores CSV rows keyed by repair type, OBD-II code, and vehicle specifics, with generic fallback rows for unsupported vehicles.

Synthesized response — the Orchestrator's output: a six-section structured answer with severity classification, source citations per claim, and a recommended action tied to the severity tier.

10. External Interfaces

OpenAPI-defined tools (6 total):

Tool	Endpoint	Purpose	Rate / Auth
VIN Decoder	NHTSA vPIC API	Decode a VIN to year/make/model/body type	Public, no key
Recalls Lookup	NHTSA Recalls API	Active recalls for a year/make/model	Public, no key
Safety Ratings	NHTSA Safety Ratings API	NCAP crash-test ratings	Public, no key
Complaints	NHTSA Complaints API	Consumer complaints for a vehicle	Public, no key
ZIP Geocoder	Nominatim (OSM)	Convert a ZIP code to lat/lon coordinates	Public, usage policy
Repair Shop Search	OSM Overpass API	Find `shop=car_repair` POIs near coordinates	Public, usage policy

RAG knowledge resources (2 total):

Resource	Content	Format	Coverage
ServiceBay Vehicle Manuals	Owner's manuals for 2019 Honda CR-V and 2021 Toyota Camry	10 split PDFs, Orchestrate-managed chunking	MVP: 2 vehicles
Repair Cost Reference	RepairPal-aligned cost ranges by repair type and OBD-II code	CSV, 14 repair types × 33 codes + generic fallbacks	MVP: common repairs

11. Error Handling & Resilience

NHTSA API unavailability. The recall agent treats NHTSA as a best-effort data source. If any of the three NHTSA endpoints returns an error or times out, the agent returns a partial result with the available data and a note indicating which lookups could not be completed. The Orchestrator still synthesizes a response — a missing recall check does not block the diagnostic or cost sections. The response explicitly states that recall status could not be verified and recommends the driver check NHTSA.gov directly.

OSM/Nominatim unavailability. If the geocoder or Overpass API is unreachable, the repair agent returns an empty shop list with an explanation. The Orchestrator omits the shop section and includes a fallback recommendation to search for nearby shops manually.

RAG retrieval failures. The knowledge agent's three-path fallback handles retrieval-quality issues at the application level. If the Orchestrate knowledge resource itself is unreachable (platform-level failure), the Orchestrator proceeds without the diagnostic section and notes the limitation.

Cost fabrication prevention. The cost agent's hard rules are the system's primary trust control. If no matching row exists in the Repair Cost Reference, the agent returns "cost estimate unavailable for this repair type" rather than generating a figure. This is a deliberate design choice — a missing estimate is better than a fabricated one.

Graceful degradation pattern. The system is designed so that any single agent failure produces a partial but honest response rather than a complete failure. The Orchestrator's synthesis step handles missing sections by acknowledging what couldn't be retrieved and suggesting the driver pursue that information through other channels.

12. Non-Functional Requirements (Measured)

NFR	Target	Basis
End-to-end response time	< 2 minutes	Measured against the baseline of ~30 min manual research
Source citation coverage	100% of factual claims	Enforced by agent instructions; validated in test suite
Cost fabrication rate	0%	Hard rules in cost agent; tested with adversarial prompts
Test pass rate	16/17 (94%)	End-to-end test suite; 1 expected failure (unsupported vehicle graceful degradation — passes on fallback path, flagged as known limitation scope)
Knowledge base coverage	2 reference vehicles	MVP scope; unsupported vehicles handled by fallback path
Shop search radius	5 mi default, 10 mi expanded	Expansion triggered when < 3 results in initial radius
Severity classification accuracy	3-tier (CRITICAL / CAUTION / INFO)	Validated against manufacturer severity definitions in test cases

13. Tech Stack

Layer	Technology	Role
Agent runtime	IBM watsonx Orchestrate	No-code multi-agent orchestration, tool integrations, knowledge resources
LLM	GPT-OSS 120B via Groq	Inference engine (routed through Orchestrate trial environment)
Prompt framework	WISER	Structured agent behavior definition (Who, Instructions, Sub-tasks, Examples, Review)
External APIs	NHTSA (4 endpoints), Nominatim, Overpass	Vehicle data, geocoding, POI search
Knowledge ingestion	Orchestrate knowledge resources	PDF chunking + vector retrieval (manuals), CSV retrieval (cost reference)
Integration bridge	MCP Server (Python)	Exposes Orchestrate agents to Claude Desktop via Model Context Protocol
Export format	YAML/ZIP	Reproducible agent/tool/KB definitions under `agents/orchestrate/exports/`

14. Security & Compliance

All external calls are read-only against public government and open-data APIs; no PII is persisted. Cost figures are constrained to the curated reference KB, with explicit hard rules preventing fabricated dollar amounts — the primary trust and safety control of the system. The Orchestrate environment manages authentication and access control for the agent runtime. No user data leaves the Orchestrate session boundary.

15. Deployment & Operations

The entire system runs inside watsonx Orchestrate; agents, tools, and knowledge bases are exported as YAML/zip definitions under agents/orchestrate/exports/ for reproducible import. A companion MCP server (separate private repo) exposes the same agents to Claude Desktop, enabling cross-platform access to ServiceBay AI's capabilities from within a general-purpose AI workflow.

16. Cross-Project Context

Portable architecture pattern. The same orchestrator-plus-specialists multi-agent pattern transfers directly to other diagnostic domains. The architectural decisions validated in ServiceBay AI — the Knowledge Synthesizer archetype, WISER prompting, domain-specific agent isolation, and the graceful-degradation pattern — are domain-agnostic. The pattern also extends to a delivery-vertical concept: ServiceBay's Auto Parts Delivery shares the bulk of a common platform engine (real-time multi-chain inventory aggregation, gig-driver dispatch, location-aware store routing) that generalizes across verticals.

MCP Server (integration bridge). The ServiceBay AI MCP Server bridges watsonx Orchestrate and Claude Desktop via the Model Context Protocol. It exposes four tools — list_agents, chat_with_agent, check_recalls, and find_repair_shops — that let Claude call ServiceBay AI agents natively. The two direct-API tools (check_recalls and find_repair_shops) operate independently of Orchestrate, providing resilience: if the Orchestrate instance is unavailable, recall lookups and shop discovery still work through Claude.

17. Risks, Assumptions & Limitations

Manual-backed RAG currently covers two reference vehicles; unsupported vehicles fall back to a documented limitation path (1 of 17 test scenarios).
Cost accuracy depends on the curated reference; inferred-cause paths require assumption notes. The system will not fabricate a cost figure — it returns "unavailable" rather than guessing.
Shop discovery quality depends on OpenStreetMap coverage in the driver's area. Rural areas with sparse OSM data may return few or no results even at the expanded 10-mile radius.
The system assumes the driver's symptom description is accurate. Ambiguous or misleading symptom descriptions may route to incorrect diagnostic paths.
Model behavior is governed by WISER prompts, not code. Prompt drift across model updates is a maintenance risk — the test suite is the primary regression guard.

18. Roadmap

Phase 1 — Everyday Driver (current). Free-tier acquisition funnel targeting the everyday driver who encounters a warning light. MVP delivers diagnosis, recall check, cost estimate, and shop discovery in one conversational answer. The core value proposition: turn 30 minutes of fragmented research into a 2-minute verified answer.

Phase 2 — DIY Enthusiast. Expand to OBD-II code lookup (accept scanner codes directly, not just symptom descriptions) and parts identification with affiliate-revenue links. This phase serves drivers who own a scanner but need help interpreting codes and sourcing parts. Requires expanding the knowledge base to cover OBD-II code definitions and adding parts-catalog integrations.

Phase 3 — Collector & Enthusiast. Premium subscription tier for restoration guidance and vehicle valuation. This phase targets classic-car owners and enthusiasts who need specialized knowledge not covered by standard owner's manuals. Requires curated restoration knowledge bases and integration with valuation data sources.

Auto Parts Delivery (spin-off). An on-demand last-mile delivery vertical for auto parts, built on a shared platform engine that generalizes across delivery verticals. Partners include AutoZone, O'Reilly, NAPA, and Advance Auto Parts (phased). The integration point is natural: ServiceBay AI identifies what's needed, the delivery vertical brings it to you.

Diagrams: Sequence Diagram · Data Flow Diagram · Component Diagram