# Parallel Agent Interaction After you can launch a single agent environment, the next step is to run many agent tasks in parallel. In this setting, each sample gets its own sandbox, the model interacts with that sandbox over multiple turns, and Uni-Agent collects the resulting trajectories and rewards. This page uses a **SWE agent** workflow as the running example. You will prepare SWE-Bench data, run model-environment interaction with multiple workers, and verify the generated solutions. The inference and verification scripts for this page live under `examples/agent_interaction`. **Reference results on SWE-Bench Verified with Uni-Agent:** | **Model** | Inference Config | **Uni-Agent** | | ------------------------------ | -------------------------------------------------- |:-------------:| | Qwen3-Coder-30B-A3B-Instruct | temp=0.8, topp=0.9, tp=4, 100 turns, 128K context | **49.2** (Avg@4) | | Qwen3-Coder-480B-A35B-Instruct | temp=0.8, topp=0.9, tp=16, 500 turns, 256K context | **64.2** (Avg@4) | | Qwen3-Coder-Next | temp=0.8, topp=0.9, tp=16, 300 turns, 128K context | **67.6** (Avg@4) | | Qwen3.5-4B | temp=0.8, topp=0.9, tp=4, 100 turns, 64K context | **45.2** (Avg@1) | | Qwen3.5-9B | temp=1.0, topp=0.7, tp=4, 100 turns, 64K context | **53.8** (Avg@1) | | Qwen3.5-35B-A3B | temp=1.0, topp=0.7, tp=4, 300 turns, 128K context | **68.4** (Avg@1) | **Reference results on Terminal-Bench v2 with Uni-Agent:** | **Model** | Inference Config | **Uni-Agent** | | ------------------ | --------------------------------------------------- |:-------------:| | Qwen3.6-35B-A3B | temp=1.0, topp=0.95, tp=8, 200K context | **42.53** (Avg@1) | `Avg@N` reports the average pass rate over `N` rollouts per task. --- ## Step 1: Prepare the dataset Start with the dataset. A parallel interaction sample needs the prompt, the sandbox setup, and the reward metadata required for verification. Use `examples/data_preprocess/swe_bench_verified.py` to fetch [SWE-Bench Verified](https://huggingface.co/datasets/princeton-nlp/SWE-bench_Verified) and build a Parquet file in the format Uni-Agent expects. Set `DEPLOYMENT` to match the sandbox backend you plan to use, because the preprocessing step writes backend-specific image names. The commands below use Modal as the example backend. ```bash DEPLOYMENT=modal python examples/data_preprocess/swe_bench_verified.py --local-save-dir ~/data/swe_agent ``` The script writes `~/data/swe_agent/swe_bench_verified_.parquet`, for example `~/data/swe_agent/swe_bench_verified_modal.parquet`. --- ## Step 2: Run parallel inference Once the dataset is ready, use `parallel_infer.py` to run the agent loop over many samples. Uni-Agent loads the model, starts multiple agent workers, creates a sandbox for each active task, and reports the mean reward score. ### Single-Node ```bash DATA_PATH=~/data/swe_agent/swe_bench_verified_modal.parquet AGENT_CONFIG=examples/agent_interaction/agent_config_modal.yaml python examples/agent_interaction/parallel_infer.py \ --data-path $DATA_PATH \ --model-path ~/models/Qwen3-Coder-30B-A3B-Instruct \ --agent-config-path $AGENT_CONFIG \ --num-workers 8 \ --max-turns 100 \ --max-samples 4 ``` - `--num-workers`: number of parallel agent environments. Tune this to your GPU resources and sandbox quota. - `--max-samples`: cap the number of dataset rows to run. Use `-1` for the full dataset. - `--n`: number of rollouts per prompt. ### Multi-node / Ray job submission To run on a Ray cluster, submit the same script with `ray job submit` and provide a runtime environment YAML. Put backend credentials in that file, for example `MODAL_TOKEN_ID` and `MODAL_TOKEN_SECRET` for Modal, or `VEFAAS_FUNCTION_ID`, `VEFAAS_FUNCTION_ROUTE`, `VOLCE_ACCESS_KEY`, and `VOLCE_SECRET_KEY` for veFaaS. See `examples/agent_interaction/runtime_env.yaml` for an example. ```bash ray job submit --no-wait \ --runtime-env examples/agent_interaction/runtime_env.yaml \ --working-dir . \ -- python3 examples/agent_interaction/parallel_infer.py \ --data-path ~/data/swe_agent/swe_bench_verified_modal.parquet \ --model-path ~/models/Qwen3-Coder-30B-A3B-Instruct \ --agent-config-path examples/agent_interaction/agent_config_modal.yaml \ --nnodes 4 \ --n-gpus-per-node 8 \ --max-samples -1 ``` Edit `runtime_env.yaml` to set your credentials, and do not commit real secrets. ### Agent config Uni-Agent groups the environment, tool, and interaction parameters into a single agent config. This example uses `examples/agent_interaction/agent_config_modal.yaml`; use `examples/agent_interaction/agent_config_vefaas.yaml` if you run on veFaaS. Below is the main shape of the config: ```yaml # examples/agent_interaction/agent_config_modal.yaml - name: swe_agent _target_: uni_agent.agent_loop.UniAgentLoop concurrency: 64 log_dir: /tmp/swebench_qwen3_coder mask_abnormal_exit_traj: false interaction: action_timeout: 300 max_turns: 100 env: deployment: type: modal startup_timeout: 600 runtime_timeout: 300 deployment_timeout: 3600 # If your machine needs a proxy to connect to Modal proxy: http://: env_variables: PIP_PROGRESS_BAR: "off" PIP_CACHE_DIR: "~/.cache/pip" PAGER: "cat" MANPAGER: "cat" LESS: "-R" TQDM_DISABLE: "1" GIT_PAGER: "cat" tools: - name: str_replace_editor - name: execute_bash - name: submit reward: eval_timeout: 600 ``` - `concurrency` limits the number of in-flight agent loops. - `interaction` controls per-action timeout and max turns. - `env.deployment` defines the default sandbox backend. Per-sample fields such as image and post-setup command come from `tools_kwargs.env`. If your machine needs a proxy to connect to Modal, set `proxy` to your proxy address. - `tools` lists the tools installed into each sandbox and exposed to the model. - `reward` provides default reward settings. Per-sample reward metadata comes from `tools_kwargs.reward`.