Parallel Agent Interaction

After you can launch a single agent environment, the next step is to run many agent tasks in parallel. In this setting, each sample gets its own sandbox, the model interacts with that sandbox over multiple turns, and Uni-Agent collects the resulting trajectories and rewards.

This page uses a SWE agent workflow as the running example. You will prepare SWE-Bench data, run model-environment interaction with multiple workers, and verify the generated solutions.

The inference and verification scripts for this page live under examples/agent_interaction.

Reference results on SWE-Bench Verified with Uni-Agent:

Model	Inference Config	Uni-Agent
Qwen3-Coder-30B-A3B-Instruct	temp=0.8, topp=0.9, tp=4, 100 turns, 128K context	49.2 (Avg@4)
Qwen3-Coder-480B-A35B-Instruct	temp=0.8, topp=0.9, tp=16, 500 turns, 256K context	64.2 (Avg@4)
Qwen3-Coder-Next	temp=0.8, topp=0.9, tp=16, 300 turns, 128K context	67.6 (Avg@4)
Qwen3.5-4B	temp=0.8, topp=0.9, tp=4, 100 turns, 64K context	45.2 (Avg@1)
Qwen3.5-9B	temp=1.0, topp=0.7, tp=4, 100 turns, 64K context	53.8 (Avg@1)
Qwen3.5-9B	temp=1.0, topp=0.95, tp=4, 200 turns, 128k context	65.6 (Avg@1)
Qwen3.5-35B-A3B	temp=1.0, topp=0.7, tp=4, 300 turns, 128K context	68.4 (Avg@1)

Reference results on SWE-bench Multilingual with Uni-Agent:

Model	Inference Config	Uni-Agent
Qwen3-Coder-30B-A3B-Instruct	200 turns, 128K context	32.3 (Avg@1)

Reference results on Terminal-Bench v2 with Uni-Agent:

Model	Inference Config	Uni-Agent
Qwen3.6-35B-A3B	temp=1.0, topp=0.95, tp=8, 200K context	42.53 (Avg@1)

Avg@N reports the average pass rate over N rollouts per task.

Step 1: Prepare the dataset

Start with the dataset. A parallel interaction sample needs the prompt, the sandbox setup, and the reward metadata required for verification.

Use examples/data_preprocess/swe_bench_verified.py to fetch SWE-Bench Verified and build a Parquet file in the format Uni-Agent expects. Set DEPLOYMENT to match the sandbox backend you plan to use, because the preprocessing step writes backend-specific image names. The commands below use Modal as the example backend.

DEPLOYMENT=modal python examples/data_preprocess/swe_bench_verified.py --local-save-dir ~/data/swe_agent

The script writes ~/data/swe_agent/swe_bench_verified_<deployment>.parquet, for example ~/data/swe_agent/swe_bench_verified_modal.parquet.

Step 2: Run parallel inference

Once the dataset is ready, use parallel_infer.py to run the agent loop over many samples. Uni-Agent loads the model, starts multiple agent workers, creates a sandbox for each active task, and reports the mean reward score.

Single-Node

DATA_PATH=~/data/swe_agent/swe_bench_verified_modal.parquet
AGENT_CONFIG=examples/agent_interaction/agent_config_modal.yaml

python examples/agent_interaction/parallel_infer.py \
    --data-path $DATA_PATH \
    --model-path ~/models/Qwen3-Coder-30B-A3B-Instruct \
    --agent-config-path $AGENT_CONFIG \
    --num-workers 8 \
    --max-turns 100 \
    --max-samples 4

--num-workers: number of parallel agent environments. Tune this to your GPU resources and sandbox quota.
--max-samples: cap the number of dataset rows to run. Use -1 for the full dataset.
--n: number of rollouts per prompt.

Multi-node / Ray job submission

To run on a Ray cluster, submit the same script with ray job submit and provide a runtime environment YAML. Put backend credentials in that file, for example MODAL_TOKEN_ID and MODAL_TOKEN_SECRET for Modal, or VEFAAS_FUNCTION_ID, VEFAAS_FUNCTION_ROUTE, VOLCE_ACCESS_KEY, and VOLCE_SECRET_KEY for veFaaS. See examples/agent_interaction/runtime_env.yaml for an example.

ray job submit --no-wait \
    --runtime-env examples/agent_interaction/runtime_env.yaml \
    --working-dir . \
    -- python3 examples/agent_interaction/parallel_infer.py \
    --data-path ~/data/swe_agent/swe_bench_verified_modal.parquet \
    --model-path ~/models/Qwen3-Coder-30B-A3B-Instruct \
    --agent-config-path examples/agent_interaction/agent_config_modal.yaml \
    --nnodes 4 \
    --n-gpus-per-node 8 \
    --max-samples -1

Edit runtime_env.yaml to set your credentials, and do not commit real secrets.

Agent config

Uni-Agent groups the environment, tool, and interaction parameters into a single agent config. This example uses examples/agent_interaction/agent_config_modal.yaml; use examples/agent_interaction/agent_config_vefaas.yaml if you run on veFaaS.

Below is the main shape of the config:

# examples/agent_interaction/agent_config_modal.yaml

- name: swe_agent
  _target_: uni_agent.agent_loop.UniAgentLoop
  concurrency: 64
  log_dir: /tmp/swebench_qwen3_coder
  mask_abnormal_exit_traj: false

  interaction:
    action_timeout: 300
    max_turns: 100

  env:
    deployment:
      type: modal
      startup_timeout: 600
      runtime_timeout: 300
      deployment_timeout: 3600
      # If your machine needs a proxy to connect to Modal
      proxy: http://<proxy-host>:<proxy-port>
    env_variables:
      PIP_PROGRESS_BAR: "off"
      PIP_CACHE_DIR: "~/.cache/pip"
      PAGER: "cat"
      MANPAGER: "cat"
      LESS: "-R"
      TQDM_DISABLE: "1"
      GIT_PAGER: "cat"

  tools:
    - name: str_replace_editor
    - name: execute_bash
    - name: submit

  reward:
    eval_timeout: 600

concurrency limits the number of in-flight agent loops.
interaction controls per-action timeout and max turns.
env.deployment defines the default sandbox backend. Per-sample fields such as image and post-setup command come from tools_kwargs.env. If your machine needs a proxy to connect to Modal, set proxy to your proxy address.
tools lists the tools installed into each sandbox and exposed to the model.
reward provides default reward settings. Per-sample reward metadata comes from tools_kwargs.reward.