Train a Search Agent

This page walks through the end-to-end search agent example under examples/search_agent. The agent is trained on the ASearcher dataset and learns to answer open-domain questions by calling a self-hosted LocalWiki retrieval service.

The example uses two tools:

search: query Wikipedia passages through LocalWiki, or use its crawl command to fetch full passages for selected Wikipedia URLs.
finish: submit the final answer for reward computation.

Training uses the same fully asynchronous verl stack described in the agent training guide. The difference is the task: instead of editing code in a sandbox, the agent repeatedly searches, reads, reasons, and submits an answer.

Workflow

The full workflow has three steps:

Preprocess ASearcher data into Uni-Agent Parquet format.
Start the LocalWiki retrieval service.
Submit the fully async training job.

The wrapper script examples/search_agent/run_localwiki_and_train.sh handles steps 2 and 3 together.

Step 1: Prepare the ASearcher Dataset

Use examples/data_preprocess/asearcher.py to convert raw ASearcher JSON or JSONL data into Parquet files:

python examples/data_preprocess/asearcher.py \
    --input_json /path/to/asearcher.jsonl \
    --local_save_dir ~/uni_agent_data/data/asearcher_uni_processed \
    --train_rows 8192 \
    --test_rows 100

This writes:

~/uni_agent_data/data/asearcher_uni_processed/train.parquet
~/uni_agent_data/data/asearcher_uni_processed/test.parquet

Each row contains:

prompt: the system and user messages.
agent_name: set to search_agent, matching examples/search_agent/agent_config.yaml.
extra_info.tools_kwargs.reward: the ground-truth answer used by the search reward.

Step 2: Prepare LocalWiki

The search and crawl tools call a LocalWiki HTTP service backed by a FAISS index and BGE-M3 embeddings. The service provides:

/retrieve: semantic search over Wikipedia passages.
/crawl: full-passage lookup by Wikipedia URL.

Follow uni_agent/tools/search/localwiki/README.md to prepare the retrieval artifacts. The recommended path is to download the prebuilt FAISS index and corpus:

export DATA_ROOT=${HOME}/uni_agent_data

hf download begunner/wikipedia-2024-06-bge-m3-faiss-ivf \
    --repo-type dataset \
    --local-dir "$DATA_ROOT/wiki24"

cd "$DATA_ROOT/wiki24"
cat wiki24_faiss.index.part?? > wiki24_faiss.index
mv "$DATA_ROOT/wiki24/preprocessed" "$DATA_ROOT/wiki24/wiki24_preprocessed"

You also need the retrieval model:

hf download BAAI/bge-m3 --local-dir "$DATA_ROOT/model/bge-m3"

The wrapper script starts LocalWiki for you, so you do not need to start the server manually for training.

Step 3: Run Training

Start from the repository root with a running Ray cluster. Set DATA_ROOT to the directory that contains the processed ASearcher data, model checkpoint, and LocalWiki artifacts:

DATA_ROOT=~/uni_agent_data \
bash examples/search_agent/run_localwiki_and_train.sh

The expected layout is:

${DATA_ROOT}/
├── data/asearcher_uni_processed/
│   ├── train.parquet
│   └── test.parquet
├── model/
│   ├── Qwen3-30B-A3B-Thinking-2507/
│   └── bge-m3/
└── wiki24/
    ├── wiki24_faiss.index
    ├── wiki24_data.jsonl
    └── wiki24_preprocessed/

The wrapper does the following:

Starts uni_agent/tools/search/localwiki/run_localwiki.sh.
Waits for http://127.0.0.1:8001/docs to become reachable.
Patches examples/search_agent/agent_config.yaml with the Ray head IP.
Submits examples/search_agent/train_fully_async_128K.sh with ray job submit.
Keeps the LocalWiki process alive for the training job.

LocalWiki logs are written under ${LOG_DIR:-logs}/localwiki_<timestamp>.log.

Key Files

examples/search_agent/agent_config.yaml: agent loop config. It uses host deployment, the hermes tool parser, search and finish tools, and the search reward.
examples/search_agent/runtime_env.yaml: Ray runtime env for packaging Uni-Agent, verl, Python dependencies, and environment variables.
examples/search_agent/train_fully_async_128K.sh: fully async GRPO training script with a 128K response budget.
examples/search_agent/run_localwiki_and_train.sh: wrapper that starts LocalWiki and submits the training job.

The training script automatically resolves the Ray head IP and writes a temporary agent config where:

RETRIEVAL_SERVICE_URL: "http://${RAY_HEAD_IP}:8001/retrieve"
CRAWL_SERVICE_URL: "http://${RAY_HEAD_IP}:8001/crawl"

Useful Overrides

Common environment variables:

DATA_ROOT: root directory for data, model checkpoints, and LocalWiki artifacts.
LOCALWIKI_PORT: LocalWiki service port, default 8001.
LOCALWIKI_READY_TIMEOUT: how long the wrapper waits for LocalWiki startup, default 300 seconds.
LOG_DIR: where LocalWiki logs are written.
NNODES_ROLLOUT, NNODES_TRAIN, NGPUS_PER_NODE: Ray cluster shape for fully async training.

Common training settings in train_fully_async_128K.sh:

rollout_n: number of rollouts per prompt.
max_prompt_length, max_response_length: context budget.
actor_rollout_ref.rollout.agent.num_workers: number of agent rollout workers.
staleness_threshold, trigger_parameter_sync_step, require_batches, partial_rollout: fully async scheduling behavior.

Inference Only

If you only want to run rollouts without training, start LocalWiki first:

DATA_ROOT=~/uni_agent_data \
bash uni_agent/tools/search/localwiki/run_localwiki.sh

Then run parallel inference with the search agent config:

python examples/agent_interaction/parallel_infer.py \
    --data-path ~/uni_agent_data/data/asearcher_uni_processed/test.parquet \
    --model-path ~/uni_agent_data/model/Qwen3-30B-A3B-Thinking-2507 \
    --agent-config-path examples/search_agent/agent_config.yaml \
    --engine vllm \
    --tensor-parallel-size 4 \
    --num-workers 8 \
    --max-turns 64 \
    --max-samples 4

Make sure RETRIEVAL_SERVICE_URL and CRAWL_SERVICE_URL point to the LocalWiki server reachable by the rollout workers.

Output

During training, metrics such as reward, response length, tool-call counts, and validation generations are logged under the search_agent project. The reward is computed by uni_agent/reward/search.py, which extracts the submitted finish answer and compares it against the ASearcher ground truth.