# Train a Search Agent This page walks through the end-to-end search agent example under `examples/search_agent`. The agent is trained on the [ASearcher](https://github.com/inclusionAI/ASearcher) dataset and learns to answer open-domain questions by calling a self-hosted LocalWiki retrieval service. The example uses two tools: - `search`: query Wikipedia passages through LocalWiki, or use its `crawl` command to fetch full passages for selected Wikipedia URLs. - `finish`: submit the final answer for reward computation. Training uses the same fully asynchronous `verl` stack described in the agent training guide. The difference is the task: instead of editing code in a sandbox, the agent repeatedly searches, reads, reasons, and submits an answer. --- ## Workflow The full workflow has three steps: 1. Preprocess ASearcher data into Uni-Agent Parquet format. 2. Start the LocalWiki retrieval service. 3. Submit the fully async training job. The wrapper script `examples/search_agent/run_localwiki_and_train.sh` handles steps 2 and 3 together. --- ## Step 1: Prepare the ASearcher Dataset Use `examples/data_preprocess/asearcher.py` to convert raw ASearcher JSON or JSONL data into Parquet files: ```bash python examples/data_preprocess/asearcher.py \ --input_json /path/to/asearcher.jsonl \ --local_save_dir ~/uni_agent_data/data/asearcher_uni_processed \ --train_rows 8192 \ --test_rows 100 ``` This writes: - `~/uni_agent_data/data/asearcher_uni_processed/train.parquet` - `~/uni_agent_data/data/asearcher_uni_processed/test.parquet` Each row contains: - `prompt`: the system and user messages. - `agent_name`: set to `search_agent`, matching `examples/search_agent/agent_config.yaml`. - `extra_info.tools_kwargs.reward`: the ground-truth answer used by the search reward. --- ## Step 2: Prepare LocalWiki The `search` and `crawl` tools call a LocalWiki HTTP service backed by a FAISS index and BGE-M3 embeddings. The service provides: - `/retrieve`: semantic search over Wikipedia passages. - `/crawl`: full-passage lookup by Wikipedia URL. Follow `uni_agent/tools/search/localwiki/README.md` to prepare the retrieval artifacts. The recommended path is to download the prebuilt FAISS index and corpus: ```bash export DATA_ROOT=${HOME}/uni_agent_data hf download begunner/wikipedia-2024-06-bge-m3-faiss-ivf \ --repo-type dataset \ --local-dir "$DATA_ROOT/wiki24" cd "$DATA_ROOT/wiki24" cat wiki24_faiss.index.part?? > wiki24_faiss.index mv "$DATA_ROOT/wiki24/preprocessed" "$DATA_ROOT/wiki24/wiki24_preprocessed" ``` You also need the retrieval model: ```bash hf download BAAI/bge-m3 --local-dir "$DATA_ROOT/model/bge-m3" ``` The wrapper script starts LocalWiki for you, so you do not need to start the server manually for training. --- ## Step 3: Run Training Start from the repository root with a running Ray cluster. Set `DATA_ROOT` to the directory that contains the processed ASearcher data, model checkpoint, and LocalWiki artifacts: ```bash DATA_ROOT=~/uni_agent_data \ bash examples/search_agent/run_localwiki_and_train.sh ``` The expected layout is: ```text ${DATA_ROOT}/ ├── data/asearcher_uni_processed/ │ ├── train.parquet │ └── test.parquet ├── model/ │ ├── Qwen3-30B-A3B-Thinking-2507/ │ └── bge-m3/ └── wiki24/ ├── wiki24_faiss.index ├── wiki24_data.jsonl └── wiki24_preprocessed/ ``` The wrapper does the following: 1. Starts `uni_agent/tools/search/localwiki/run_localwiki.sh`. 2. Waits for `http://127.0.0.1:8001/docs` to become reachable. 3. Patches `examples/search_agent/agent_config.yaml` with the Ray head IP. 4. Submits `examples/search_agent/train_fully_async_128K.sh` with `ray job submit`. 5. Keeps the LocalWiki process alive for the training job. LocalWiki logs are written under `${LOG_DIR:-logs}/localwiki_.log`. --- ## Key Files - `examples/search_agent/agent_config.yaml`: agent loop config. It uses host deployment, the `hermes` tool parser, `search` and `finish` tools, and the `search` reward. - `examples/search_agent/runtime_env.yaml`: Ray runtime env for packaging Uni-Agent, `verl`, Python dependencies, and environment variables. - `examples/search_agent/train_fully_async_128K.sh`: fully async GRPO training script with a 128K response budget. - `examples/search_agent/run_localwiki_and_train.sh`: wrapper that starts LocalWiki and submits the training job. The training script automatically resolves the Ray head IP and writes a temporary agent config where: ```yaml RETRIEVAL_SERVICE_URL: "http://${RAY_HEAD_IP}:8001/retrieve" CRAWL_SERVICE_URL: "http://${RAY_HEAD_IP}:8001/crawl" ``` --- ## Useful Overrides Common environment variables: - `DATA_ROOT`: root directory for data, model checkpoints, and LocalWiki artifacts. - `LOCALWIKI_PORT`: LocalWiki service port, default `8001`. - `LOCALWIKI_READY_TIMEOUT`: how long the wrapper waits for LocalWiki startup, default `300` seconds. - `LOG_DIR`: where LocalWiki logs are written. - `NNODES_ROLLOUT`, `NNODES_TRAIN`, `NGPUS_PER_NODE`: Ray cluster shape for fully async training. Common training settings in `train_fully_async_128K.sh`: - `rollout_n`: number of rollouts per prompt. - `max_prompt_length`, `max_response_length`: context budget. - `actor_rollout_ref.rollout.agent.num_workers`: number of agent rollout workers. - `staleness_threshold`, `trigger_parameter_sync_step`, `require_batches`, `partial_rollout`: fully async scheduling behavior. --- ## Inference Only If you only want to run rollouts without training, start LocalWiki first: ```bash DATA_ROOT=~/uni_agent_data \ bash uni_agent/tools/search/localwiki/run_localwiki.sh ``` Then run parallel inference with the search agent config: ```bash python examples/agent_interaction/parallel_infer.py \ --data-path ~/uni_agent_data/data/asearcher_uni_processed/test.parquet \ --model-path ~/uni_agent_data/model/Qwen3-30B-A3B-Thinking-2507 \ --agent-config-path examples/search_agent/agent_config.yaml \ --engine vllm \ --tensor-parallel-size 4 \ --num-workers 8 \ --max-turns 64 \ --max-samples 4 ``` Make sure `RETRIEVAL_SERVICE_URL` and `CRAWL_SERVICE_URL` point to the LocalWiki server reachable by the rollout workers. --- ## Output During training, metrics such as reward, response length, tool-call counts, and validation generations are logged under the `search_agent` project. The reward is computed by `uni_agent/reward/search.py`, which extracts the submitted `finish` answer and compares it against the ASearcher ground truth.