🧠 AI Tooling

AI tools that
actually ship.

Multi-model orchestration, prompt engineering, and production AI workflows. Not just using AI tools — architecting the systems behind them, and evaluating output with two decades of production experience.

Evaluator proof: I review AI output against production standards for code correctness, UX clarity, visual direction, and business fit.
Luke Litman headshot
5+
LLMs Used Daily
2
Production AI Agents
7+
Vercel Deployments
21+
GitHub Repos

"I evaluate AI output because I know what correct looks like. When a model generates code, designs, or analysis, I can tell whether it's right — not because I ran a test suite, but because I've spent 20 years building the same kinds of systems by hand. That judgment is what separates AI operators from AI architects."

— On AI Evaluation
Case Study

How I evaluate AI-generated work

A compact example of the rubric I use when reviewing AI output: not just whether it looks plausible, but whether it would survive production.

🧪

Scenario: Game UI Generated by AI

An AI model produces a mobile game upgrade screen: hero art, currency balances, CTA hierarchy, reward copy, and layout annotations. My review separates surface polish from production readiness.

UX Review Visual QA Game Systems Implementation Notes

Evaluation Criteria

  • Task fit: Does the screen solve the actual player/job-to-be-done?
  • Information hierarchy: Are currency, cost, reward, and next action immediately legible?
  • Production feasibility: Can the design be implemented with reusable components and sane asset budgets?
  • Brand consistency: Does the output match tone, visual system, and platform constraints?
  • Failure modes: What would break under localization, small screens, missing data, or live-ops variants?

"The final score is never a vibe check. I return specific pass/fail notes, severity, suggested fixes, and the reason each issue matters to players, developers, or the business."

— AI Evaluation Workflow
Capabilities

AI expertise across the full stack

🔗

Multi-Model Orchestration

Daily workflow spanning GPT-4o, o1, o3, Claude, Grok, and Gemini. Each model selected for its strengths — reasoning, creative generation, code review, evaluation — and composed into coherent pipelines.

🤖

Agent Architecture

Built WAP (multi-persona AI agent platform) and run production autonomous agents across multiple workspaces. Agents handle brand development, project management, infrastructure, and client communication.

🎯

Prompt Engineering

System prompt architecture, chain-of-thought design, persona crafting, and evaluation frameworks. Every agent I deploy has carefully engineered prompts calibrated through iterative testing.

🖼️

Generative AI Production

DALL-E, Midjourney, and generative workflows integrated into production creative pipelines. AI-generated assets that meet professional art direction standards — because I'm also an Art Director.

OpenAI Ecosystem

Deep experience with the OpenAI API ecosystem: Chat Completions, Assistants, function calling, vision, DALL-E, embeddings. Custom Vercel gateway routing to OpenAI services.

📊

AI Output Evaluation

Code review, design critique, content evaluation, and quality assurance for AI-generated outputs. Two decades of domain expertise across engineering, art, and product make me a rigorous evaluator.

Daily Toolkit

Models & platforms

OpenAI Ecosystem

GPT-4o GPT-4 o1 o3 DALL-E 3 Assistants API Function Calling Embeddings Vision

Other Models

Claude (Anthropic) Grok (xAI) Gemini (Google) Midjourney

Infrastructure & Tools

Vercel GitHub Python JavaScript / TypeScript REST APIs Slack Integrations Linear Google Workspace
Character Stats

Spec Sheet

Hands-on daily with production AI systems. Every rating comes from real deployment experience.

🧠 AI & Prompt Engineering

Prompt Engineering
Master
Agent Architecture
Advanced
Multi-Model Orchestration
Advanced
AI Output Evaluation
Master
Generative AI (Images)
Advanced
RAG / Embeddings
Proficient

🤖 AI Platforms & Models

OpenAI (GPT-4o, o1, o3)
Master
DALL-E 3
Advanced
Claude (Anthropic)
Advanced
Midjourney
Proficient
Gemini (Google)
Familiar
Grok (xAI)
Familiar
Vibe Familiar Proficient Advanced Master
Selected Projects

AI systems in production

🤖

WAP — Multi-Persona Agent Platform

AI Architecture · OpenAI API · Agent Design

Custom-built AI platform with multiple specialized personas, each with distinct system prompts, capabilities, and evaluation criteria. Orchestrates complex workflows across different AI models.

⚒️

IronReach — AI-Powered Brand Studio

Agent Orchestration · Production AI · Multi-Client

Runs two autonomous AI agents across workspaces coordinating brand development, project management, and infrastructure for 5+ simultaneous client engagements at ironreach.com. Real production AI, not a demo.

The Differentiator

Why my AI evaluation is better

Domain Depth

  • 20 years of software architecture — I know when generated code will break at scale
  • Professional art director — I know when generated visuals miss the brief
  • Shipped game developer — I know when game logic is subtly wrong
  • Product leader — I know when a feature spec has gaps

Multi-Model Judgment

  • I use 5+ models daily and know each one's strengths and failure modes
  • I can select the right model for the right task, not just default to GPT-4
  • I architect prompts as systems, not one-off queries
  • I evaluate outputs against real-world standards, not just "does it look right"

Need an AI architect who
actually understands the output?

I bring two decades of cross-domain expertise to AI evaluation, system design, and agent architecture.

Download Resume → Contact Me