Will Agent Evaluation via MCP Stabilize Agent Frameworks?

Exposing complex AI Evaluation frameworks to AI agents via Model Context Protocol (MCP) allows for a new paradigm of agents to self-improve in a controllable manner. Unlike the often unstable straight-forward self-criticism loops, the MCP-accessible evaluation frameworks can provide the persistence layer that stabilizes and standardizes the measure of progress towards plan fulfillment with agents.

In this talk, we show how MCP-enabled evaluation engine already allows agents to self-improve in a way that is independent of agent architectures and frameworks, and holds promise to become a cornerstone of rigorous agent development.

Watch the Recording

This presentation was delivered at the AI Engineer World's Fair by Ari Heljakka. Watch the full session below:

Key Topics Covered

Model Context Protocol (MCP) and its role in standardized agent evaluation.
Stabilizing agent frameworks through controlled, measurable self-improvement loops.
Persistent progress tracking: Using MCP-accessible frameworks as a source of truth for agents.
Framework Independence: How evaluations allow agents to improve regardless of their underlying architecture.
The path to rigor: Moving from "vibe checks" to enterprise-grade agent development.

About the Speaker

Ari Heljakka is the Founder and CEO of Scorable. In this talk, he shares insights from Scorable' pioneering work on agent evaluation and their early adoption of the Model Context Protocol to bridge the gap between AI action and AI assessment.

Subscribe to our newsletter for the latest technical insights on MCP, agent evaluation, and the evolving GenAI stack.