Multi-model orchestration, prompt engineering, and production AI workflows. Not just using AI tools — architecting the systems behind them, and evaluating output with two decades of production experience.
"I evaluate AI output because I know what correct looks like. When a model generates code, designs, or analysis, I can tell whether it's right — not because I ran a test suite, but because I've spent 20 years building the same kinds of systems by hand. That judgment is what separates AI operators from AI architects."
A compact example of the rubric I use when reviewing AI output: not just whether it looks plausible, but whether it would survive production.
An AI model produces a mobile game upgrade screen: hero art, currency balances, CTA hierarchy, reward copy, and layout annotations. My review separates surface polish from production readiness.
"The final score is never a vibe check. I return specific pass/fail notes, severity, suggested fixes, and the reason each issue matters to players, developers, or the business."
Daily workflow spanning GPT-4o, o1, o3, Claude, Grok, and Gemini. Each model selected for its strengths — reasoning, creative generation, code review, evaluation — and composed into coherent pipelines.
Built WAP (multi-persona AI agent platform) and run production autonomous agents across multiple workspaces. Agents handle brand development, project management, infrastructure, and client communication.
System prompt architecture, chain-of-thought design, persona crafting, and evaluation frameworks. Every agent I deploy has carefully engineered prompts calibrated through iterative testing.
DALL-E, Midjourney, and generative workflows integrated into production creative pipelines. AI-generated assets that meet professional art direction standards — because I'm also an Art Director.
Deep experience with the OpenAI API ecosystem: Chat Completions, Assistants, function calling, vision, DALL-E, embeddings. Custom Vercel gateway routing to OpenAI services.
Code review, design critique, content evaluation, and quality assurance for AI-generated outputs. Two decades of domain expertise across engineering, art, and product make me a rigorous evaluator.
Hands-on daily with production AI systems. Every rating comes from real deployment experience.
Custom-built AI platform with multiple specialized personas, each with distinct system prompts, capabilities, and evaluation criteria. Orchestrates complex workflows across different AI models.
Runs two autonomous AI agents across workspaces coordinating brand development, project management, and infrastructure for 5+ simultaneous client engagements at ironreach.com. Real production AI, not a demo.
I bring two decades of cross-domain expertise to AI evaluation, system design, and agent architecture.