- Define the technical architecture for SB Energy agent workflows, including ChatGPT, ROCstar, MCP, agents, skills, tools, APIs, retrieval, structured outputs, trace metadata, evals, and observability.
- Establish reusable workflow patterns for agent planning, tool calling, data retrieval, exception handling, fallback behavior, escalation, and production support.
- Create standards for skill design, prompt structure, tool descriptions, tool schemas, API response formats, source citation behavior, structured outputs, and workflow handoffs.
- Define token and context management practices, including when to use conversation state, retrieval, file search, cached context, summarization, structured intermediate state, and compressed tool returns.
- Build evaluation frameworks for agent workflows, including gold datasets, regression tests, numerical reconciliation checks, rubric-based grading, tool-call correctness checks, and human feedback loops.
- Lead the design of observability for agents and tools, including workflow logs, cost, latency, token usage, tool success rate, bad-answer rate, eval pass rate, user acceptance, and incident tracking.
- Partner with domain Forward Deployed Engineers to convert high-value workflows into measurable agent systems with explicit inputs, outputs, owners, permissions, evals, runbooks, release gates, and continuous improvement loops.
- Partner with the Enterprise Systems and Agent Platform role on MCP servers, connector reliability, RBAC, secrets, audit logging, deployment patterns, API governance, and production platform readiness.
- Review agent workflow designs before production release and define go/no-go criteria for quality, safety, reliability, cost, latency, security, and operational support.
- Create reusable templates for agent specifications, skill specifications, eval protocols, workflow scorecards, incident reviews, production readiness checklists, and workflow-level success metrics.
- Mentor FDEs, analysts, engineers, and implementation partners on eval-driven development, tool-based workflow design, observable agent operations, and production-quality AI systems.
- Identify recurring failure modes across agents and tools and turn them into tests, standards, instrumentation, documentation, and platform improvements.