AgentOrchestra: A Hierarchical Multi-Agent Framework for General-Purpose Task Solving

1. Introduction

Recent advances in Large Language Models (LLMs) or Large Multimodal Models (LMMs) have led to a shift from simple dialogue to models capable of performing sophisticated reasoning, enabling progress from answering straightforward questions to responding to complex, multi-step queries.

However, current LLMs remain largely disconnected from real-world environments due to the absence of interactive tool integration, which constrains their ability to perform grounded, general-purpose, and complex tasks.

AgentOrchestra addresses these challenges through a hierarchical multi-agent framework that integrates high-level planning with modular agent collaboration, inspired by the way a conductor orchestrates a symphony.

Key Principles

Extensibility
Multimodality
Modularity
Coordination

2. Architecture

Planning Agent

Central coordinator that decomposes complex objectives and delegates sub-tasks to specialized agents.

Click to view workflow →

Deep Researcher

Conducts thorough research on specified topics, retrieving and synthesizing high-quality information.

Click to view workflow →

Browser Use

Automates browser operations, supporting web search, information extraction, and data collection.

Click to view workflow →

Deep Analyzer

Performs in-depth analysis of input information, extracting key insights and potential requirements.

Click to view workflow →

General Tool Calling

Provides a general-purpose interface for invoking various tools and APIs with function calling support.

Click to view workflow →

MCP Manager Agent

Enables intelligent tool evolution through automated creation, dynamic retrieval, and systematic reuse of MCP tools.

Click to view workflow →

MCP Manager Agent

Problem Statement

The rapid expansion of AI agent applications has led to exponential growth in the complexity and diversity of required Model Context Protocol (MCP) tools. Traditional approaches relying on manual tool development face significant challenges including development inefficiency, version inconsistency, and limited adaptability to emerging requirements.

Solution

The MCP Manager Agent addresses these limitations through intelligent tool evolution via automated creation, dynamic retrieval, and systematic reuse mechanisms. This represents a paradigm shift from static tool provisioning to adaptive tool ecosystem management.

Core Capabilities

Tool Retrieval

Keyword pre-filtering strategy to efficiently match tasks with relevant tools from the library.

Tool Creation

Automated generation of MCP-compliant tools through intent analysis, synthesis, and validation phases.

Tool Reuse

Comprehensive tool registry with persistence, versioning, and lifecycle tracking capabilities.

Tool Creation Workflow

Intent Analysis

Parse user task intentions and extract functional requirements, input-output specifications, and operational constraints.

Tool Synthesis

Generate executable MCP-compliant tool implementations with parameterized scripts and error handling.

Validation

Multi-stage evaluation protocol assessing tool correctness, performance characteristics, and integration compatibility.

Registration

3. Experiments

GAIA Benchmark Test Results

AgentOrchestra (Our)

Other Models

AgentOrchestra

83.06

AgentOrchestra (w/o MCP)

79.07

Aworld

81.73

Su Zero Ultra

80.40

h2oGPTe Agent

79.73

desearch

78.07

Alita

75.42

Langfun Agent

73.09

o3-deep-research

68.67

JoyAgent-Genie

65.12

o4-mini-deep-research

59.33

SimpleQA Benchmark

Evaluation on simple question-answering tasks to assess basic reasoning capabilities.

95.3

GAIA Benchmark Validation

Comprehensive evaluation on real-world tasks requiring web search and reasoning.

82.42

HLE Benchmark

Human-level evaluation benchmark for complex reasoning and planning tasks.

25.9

Key Results

Performance Improvements

• Consistently outperforms flat-agent and monolithic baselines
• Superior task success rate and adaptability
• Effective hierarchical organization and role specialization

Scalability Benefits

• Modular design enables easy integration of new agents
• Flexible orchestration through explicit sub-goal formulation
• Adaptive role allocation for dynamic task requirements

Paper & Resources

Research Paper

Read our full paper "AgentOrchestra: A Hierarchical Multi-Agent Framework for General-Purpose Task Solving" published on arXiv.

View on arXiv Download PDF

Code Repository

Access the complete implementation, examples, and documentation on GitHub.

View on GitHub Documentation

Authors

Wentao Zhang

Skywork AI

Liang Zeng

Skywork AI

Yuzhen Xiao

Skywork AI

YCL

Yongcong Li

Skywork AI

Ce Cui

Skywork AI

Yilei Zhao

Nanyang Technological University

Rui Hu

Skywork AI

Yang Liu

Skywork AI

YHZ

Yahui Zhou

Skywork AI

Bo An

Nanyang Technological University

Skywork AI

Planning Agent Workflow

User Objective Input

Analyze User Objective & Context

Decompose Complex Task into Sub-tasks

Create Execution Plan with Steps

Assign Sub-tasks to Specialized Agents

Monitor Execution Progress & Collect Feedback

Need Plan Adaptation?

YES

Update Plan & Reassign Tasks

Continue Current Execution

All Tasks Completed?

YES

Aggregate Results & Complete

Return to Monitoring

Objective Achieved

Deep Researcher Agent Workflow

Research Query Input

Optimize Query with LLM

Breadth-First Search Across Multiple Engines

Extract Key Insights & Assign Relevance Scores

Generate Follow-up Queries

Reached Depth/Time Limit?

YES

Proceed to Synthesis

Continue Recursive Search

Need Data Processing?

YES

Python Interpreter for Data Processing

Skip Data Processing

Generate Structured Summary with Citations

Research Complete

Browser Use Agent Workflow

Browser Task Request

Parse Action from Central Registry

Initialize Browser Session & State Management

Execute Parameterized Browser Action

Action Type?

Navigation

URL Navigation & Tab Management

Interaction

DOM Manipulation & Form Filling

Content Extraction & Media Control

Execution Successful?

YES

Return Results & Update State

Handle Error & Retry Logic

Need Advanced Scripting?

YES

Python Interpreter for Advanced Scripting

Skip Python Processing

Task Completed

Deep Analyzer Agent Workflow

Analysis Task & Source Materials

Detect & Extract Multi-format Data

Data Type?

Text/Image

Process Text & Visual Content

Audio/Video

Process Audio & Video Content

Structure Content for Analysis

Step-by-step LLM Reasoning

Multiple Models?

YES

Synthesize Results from Multiple Models

Use Single Model Results

Need Custom Analysis?

YES

Python Interpreter for Custom Analysis

Skip Custom Analysis

Generate Coherent Analysis Summary

Analysis Complete